Abstract:Multimodal neuroimaging modeling has becomes a widely used approach but confronts considerable challenges due to heterogeneity, which encompasses variability in data types, scales, and formats across modalities. This variability necessitates the deployment of advanced computational methods to integrate and interpret these diverse datasets within a cohesive analytical framework. In our research, we amalgamate functional magnetic resonance imaging, diffusion tensor imaging, and structural MRI into a cohesive framework. This integration capitalizes on the unique strengths of each modality and their inherent interconnections, aiming for a comprehensive understanding of the brain's connectivity and anatomical characteristics. Utilizing the Glasser atlas for parcellation, we integrate imaging derived features from various modalities: functional connectivity from fMRI, structural connectivity from DTI, and anatomical features from sMRI within consistent regions. Our approach incorporates a masking strategy to differentially weight neural connections, thereby facilitating a holistic amalgamation of multimodal imaging data. This technique enhances interpretability at connectivity level, transcending traditional analyses centered on singular regional attributes. The model is applied to the Human Connectome Project's Development study to elucidate the associations between multimodal imaging and cognitive functions throughout youth. The analysis demonstrates improved predictive accuracy and uncovers crucial anatomical features and essential neural connections, deepening our understanding of brain structure and function.
Abstract:Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.
Abstract:Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In this paper, we propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF), which aims to capture both intra-modal and inter-modal relationships between fMRI and sMRI, enhancing multi-modal data representation. Specifically, our CAMF framework employs self-attention modules to identify interactions within each modality while cross-attention modules identify interactions between modalities. Subsequently, our approach optimizes the integration of latent features from both modalities. This approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets, where CAMF consistently outperforms existing methods. Furthermore, the gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia. The bio-markers identified by CAMF align with established research, potentially offering new insights into the diagnosis and pathological endophenotypes of schizophrenia.
Abstract:Functional connectivity (FC) as derived from fMRI has emerged as a pivotal tool in elucidating the intricacies of various psychiatric disorders and delineating the neural pathways that underpin cognitive and behavioral dynamics inherent to the human brain. While Graph Neural Networks (GNNs) offer a structured approach to represent neuroimaging data, they are limited by their need for a predefined graph structure to depict associations between brain regions, a detail not solely provided by FCs. To bridge this gap, we introduce the Gated Graph Transformer (GGT) framework, designed to predict cognitive metrics based on FCs. Empirical validation on the Philadelphia Neurodevelopmental Cohort (PNC) underscores the superior predictive prowess of our model, further accentuating its potential in identifying pivotal neural connectivities that correlate with human cognitive processes.
Abstract:We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%. We also evaluate the performance on real-world images and demonstrate the benefits.
Abstract:It can be difficult to identify trends and perform quality control in large, high-dimensional fMRI or omics datasets. To remedy this, we develop ImageNomer, a data visualization and analysis tool that allows inspection of both subject-level and cohort-level features. The tool allows visualization of phenotype correlation with functional connectivity (FC), partial connectivity (PC), dictionary components (PCA and our own method), and genomic data (single-nucleotide polymorphisms, SNPs). In addition, it allows visualization of weights from arbitrary ML models. ImageNomer is built with a Python backend and a Vue frontend. We validate ImageNomer using the Philadelphia Neurodevelopmental Cohort (PNC) dataset, which contains multitask fMRI and SNP data of healthy adolescents. Using correlation, greedy selection, or model weights, we find that a set of 10 FC features can explain 15% of variation in age, compared to 35% for the full 34,716 feature model. The four most significant FCs are either between bilateral default mode network (DMN) regions or spatially proximal subcortical areas. Additionally, we show that whereas both FC (fMRI) and SNPs (genomic) features can account for 10-15% of intelligence variation, this predictive ability disappears when controlling for race. We find that FC features can be used to predict race with 85% accuracy, compared to 78% accuracy for sex prediction. Using ImageNomer, this work casts doubt on the possibility of finding unbiased intelligence-related features in fMRI and SNPs of healthy adolescents.
Abstract:Existing refinement methods gradually lose their ability to further improve pose estimation methods' accuracy. In this paper, we propose a new refinement pipeline, Keypoint Refinement with Fusion Network (KRF), for 6D pose estimation, especially for objects with serious occlusion. The pipeline consists of two steps. It first completes the input point clouds via a novel point completion network. The network uses both local and global features, considering the pose information during point completion. Then, it registers the completed object point cloud with corresponding target point cloud by Color supported Iterative KeyPoint (CIKP). The CIKP method introduces color information into registration and registers point cloud around each keypoint to increase stability. The KRF pipeline can be integrated with existing popular 6D pose estimation methods, e.g. the full flow bidirectional fusion network, to further improved their pose estimation accuracy. Experiments show that our method outperforms the state-of-the-art method from 93.9\% to 94.4\% on YCB-Video dataset and from 64.4\% to 66.8\% on Occlusion LineMOD dataset. Our source code is available at https://github.com/zhanhz/KRF.
Abstract:We present a novel self-supervised algorithm named MotionHint for monocular visual odometry (VO) that takes motion constraints into account. A key aspect of our approach is to use an appropriate motion model that can help existing self-supervised monocular VO (SSM-VO) algorithms to overcome issues related to the local minima within their self-supervised loss functions. The motion model is expressed with a neural network named PPnet. It is trained to coarsely predict the next pose of the camera and the uncertainty of this prediction. Our self-supervised approach combines the original loss and the motion loss, which is the weighted difference between the prediction and the generated ego-motion. Taking two existing SSM-VO systems as our baseline, we evaluate our MotionHint algorithm on the standard KITTI benchmark. Experimental results show that our MotionHint algorithm can be easily applied to existing open-sourced state-of-the-art SSM-VO systems to greatly improve the performance by reducing the resulting ATE by up to 28.73%.
Abstract:In stable coronary artery disease (CAD), reduction in mortality and/or myocardial infarction with revascularization over medical therapy has not been reliably achieved. Coronary arteries are usually extracted to perform stenosis detection. We aim to develop an automatic algorithm by deep learning to extract coronary arteries from ICAs.In this study, a multi-input and multi-scale (MIMS) U-Net with a two-stage recurrent training strategy was proposed for the automatic vessel segmentation. Incorporating features such as the Inception residual module with depth-wise separable convolutional layers, the proposed model generated a refined prediction map with the following two training stages: (i) Stage I coarsely segmented the major coronary arteries from pre-processed single-channel ICAs and generated the probability map of vessels; (ii) during the Stage II, a three-channel image consisting of the original preprocessed image, a generated probability map, and an edge-enhanced image generated from the preprocessed image was fed to the proposed MIMS U-Net to produce the final segmentation probability map. During the training stage, the probability maps were iteratively and recurrently updated by feeding into the neural network. After segmentation, an arterial stenosis detection algorithm was developed to extract vascular centerlines and calculate arterial diameters to evaluate stenotic level. Experimental results demonstrated that the proposed method achieved an average Dice score of 0.8329, an average sensitivity of 0.8281, and an average specificity of 0.9979 in our dataset with 294 ICAs obtained from 73 patient. Moreover, our stenosis detection algorithm achieved a true positive rate of 0.6668 and a positive predictive value of 0.7043.
Abstract:Objective: Multi-modal functional magnetic resonance imaging (fMRI) can be used to make predictions about individual behavioral and cognitive traits based on brain connectivity networks. Methods: To take advantage of complementary information from multi-modal fMRI, we propose an interpretable multi-modal graph convolutional network (MGCN) model, incorporating the fMRI time series and the functional connectivity (FC) between each pair of brain regions. Specifically, our model learns a graph embedding from individual brain networks derived from multi-modal data. A manifold-based regularization term is then enforced to consider the relationships of subjects both within and between modalities. Furthermore, we propose the gradient-weighted regression activation mapping (Grad-RAM) and the edge mask learning to interpret the model, which is used to identify significant cognition-related biomarkers. Results: We validate our MGCN model on the Philadelphia Neurodevelopmental Cohort to predict individual wide range achievement test (WRAT) score. Our model obtains superior predictive performance over GCN with a single modality and other competing approaches. The identified biomarkers are cross-validated from different approaches. Conclusion and Significance: This paper develops a new interpretable graph deep learning framework for cognitive ability prediction, with the potential to overcome the limitations of several current data-fusion models. The results demonstrate the power of MGCN in analyzing multi-modal fMRI and discovering significant biomarkers for human brain studies.