Abstract:Accurate segmentation of thalamic nuclei is important for better understanding brain function and improving disease treatment. Traditional segmentation methods often rely on a single T1-weighted image, which has limited contrast in the thalamus. In this work, we introduce RATNUS, which uses synthetic T1-weighted images with many inversion times along with diffusion-derived features to enhance the visibility of nuclei within the thalamus. Using these features, a convolutional neural network is used to segment 13 thalamic nuclei. For comparison with other methods, we introduce a unified nuclei labeling scheme. Our results demonstrate an 87.19% average true positive rate (TPR) against manual labeling. In comparison, FreeSurfer and THOMAS achieve TPRs of 64.25% and 57.64%, respectively, demonstrating the superiority of RATNUS in thalamic nuclei segmentation.
Abstract:Deep learning (DL) has led to significant improvements in medical image synthesis, enabling advanced image-to-image translation to generate synthetic images. However, DL methods face challenges such as domain shift and high demands for training data, limiting their generalizability and applicability. Historically, image synthesis was also carried out using deformable image registration (DIR), a method that warps moving images of a desired modality to match the anatomy of a fixed image. However, concerns about its speed and accuracy led to its decline in popularity. With the recent advances of DL-based DIR, we now revisit and reinvigorate this line of research. In this paper, we propose a fast and accurate synthesis method based on DIR. We use the task of synthesizing a rare magnetic resonance (MR) sequence, white matter nulled (WMn) T1-weighted (T1-w) images, to demonstrate the potential of our approach. During training, our method learns a DIR model based on the widely available MPRAGE sequence, which is a cerebrospinal fluid nulled (CSFn) T1-w inversion recovery gradient echo pulse sequence. During testing, the trained DIR model is first applied to estimate the deformation between moving and fixed CSFn images. Subsequently, this estimated deformation is applied to align the paired WMn counterpart of the moving CSFn image, yielding a synthetic WMn image for the fixed CSFn image. Our experiments demonstrate promising results for unsupervised image synthesis using DIR. These findings highlight the potential of our technique in contexts where supervised synthesis methods are constrained by limited training data.
Abstract:Understanding the relationship between tongue motion patterns during speech and their resulting speech acoustic outcomes -- i.e., articulatory-acoustic relation -- is of great importance in assessing speech quality and developing innovative treatment and rehabilitative strategies. This is especially important when evaluating and detecting abnormal articulatory features in patients with speech-related disorders. In this work, we aim to develop a framework for detecting speech motion anomalies in conjunction with their corresponding speech acoustics. This is achieved through the use of a deep cross-modal translator trained on data from healthy individuals only, which bridges the gap between 4D motion fields obtained from tagged MRI and 2D spectrograms derived from speech acoustic data. The trained translator is used as an anomaly detector, by measuring the spectrogram reconstruction quality on healthy individuals or patients. In particular, the cross-modal translator is likely to yield limited generalization capabilities on patient data, which includes unseen out-of-distribution patterns and demonstrates subpar performance, when compared with healthy individuals.~A one-class SVM is then used to distinguish the spectrograms of healthy individuals from those of patients. To validate our framework, we collected a total of 39 paired tagged MRI and speech waveforms, consisting of data from 36 healthy individuals and 3 tongue cancer patients. We used both 3D convolutional and transformer-based deep translation models, training them on the healthy training set and then applying them to both the healthy and patient testing sets. Our framework demonstrates a capability to detect abnormal patient data, thereby illustrating its potential in enhancing the understanding of the articulatory-acoustic relation for both healthy individuals and patients.
Abstract:Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application of radio frequency (RF) pulses during serial imaging sequences. This is a factor that has been overlooked in prior research on tMRI post-processing. Further, we have observed an emerging trend of utilizing raw tagged MRI within a deep learning-based (DL) registration framework for motion estimation. In this work, we evaluate and analyze the impact of commonly used image similarity objectives in training DL registrations on raw tMRI. This is then compared with the Harmonic Phase-based approach, a traditional approach which is claimed to be robust to tag fading. Our findings, derived from both simulated images and an actual phantom scan, reveal the limitations of various similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time.
Abstract:The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT) framework is based on directional product relative position bias and single-level spatial pyramid pooling, thus enabling flexible processing of weighting maps with variable size to fixed-size spectrograms, without input information loss or dimension expansion. Additionally, our PLT framework efficiently models the global correlation of wide matrix input. To improve the realism of our generated spectrograms with relatively limited training samples, we apply pair-wise utterance consistency with Maximum Mean Discrepancy constraint and adversarial training. Experimental results on a dataset of 29 subjects speaking two utterances demonstrated that our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.
Abstract:Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.
Abstract:Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.
Abstract:Data-driven thalamic nuclei parcellation depends on high-quality manual annotations. However, the small size and low contrast changes among thalamic nuclei, yield annotations that are often incomplete, noisy, or ambiguously labelled. To train a robust thalamic nuclei parcellation model with noisy annotations, we propose a label propagation algorithm based on random walker to refine the annotations before model training. A two-step model was trained to generate first the whole thalamus and then the nuclei masks. We conducted experiments on a mild traumatic brain injury~(mTBI) dataset with noisy thalamic nuclei annotations. Our model outperforms current state-of-the-art thalamic nuclei parcellations by a clear margin. We believe our method can also facilitate the training of other parcellation models with noisy labels.
Abstract:Connectivity information derived from diffusion-weighted magnetic resonance images~(DW-MRIs) plays an important role in studying human subcortical gray matter structures. However, due to the $O(N^2)$ complexity of computing the connectivity of each voxel to every other voxel (or multiple ROIs), the current practice of extracting connectivity information is highly inefficient. This makes the processing of high-resolution images and population-level analyses very computationally demanding. To address this issue, we propose a more efficient way to extract connectivity information; briefly, we consider two regions/voxels to be connected if a white matter fiber streamline passes through them -- no matter where the streamline originates. We consider the thalamus parcellation task for demonstration purposes; our experiments show that our approach brings a 30 to 120 times speedup over traditional approaches with comparable qualitative parcellation results. We also demonstrate high-resolution connectivity features can be super-resolved from low-resolution DW-MRI in our framework. Together, these two innovations enable higher resolution connectivity analysis from DW-MRI. Our source code is availible at jasonbian97.github.io/fastcod.
Abstract:Accurate tongue motion estimation is essential for tongue function evaluation. The harmonic phase processing (HARP) method and the phase vector incompressible registration algorithm (PVIRA) based on HARP can generate motion estimates from tagged MRI images, but they suffer from tag jumping due to large motions. This paper proposes a new registration method by combining the stationary velocity fields produced by PVIRA between successive time frames as a new initialization of the final registration stage to avoid tag jumping. The experiment results demonstrate the proposed method can avoid tag jumping and outperform the existing methods on tongue motion estimates.