Scientific Computing and Imaging Institute, University of Utah, Department of Biomedical Engineering, University of Utah
Abstract:Large vision-language models (VLMs), such as CLIP, have become foundational, demonstrating remarkable success across a variety of downstream tasks. Despite their advantages, these models, akin to other foundational systems, inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment. Prevalent datasets like ImageNet are often riddled with non-causal, spurious correlations that can diminish VLM performance in scenarios where these contextual elements are absent. This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP's embedding for downstream applications. Our analysis reveals that the CLIP text representations are often tainted by spurious correlations, inherited in the biased pre-training dataset. Empirical evidence suggests that relying on visual representations from CLIP, as opposed to text embedding, is more practical to refine the skewed perceptions in VLMs, emphasizing the superior utility of visual representations in overcoming embedded biases. Our codes will be available here.
Abstract:The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.
Abstract:Homography estimation is a basic image-alignment method in many applications. Recently, with the development of convolutional neural networks (CNNs), some learning based approaches have shown great success in this task. However, the performance across different domains has never been researched. Unlike other common tasks (\eg, classification, detection, segmentation), CNN based homography estimation models show a domain shift immunity, which means a model can be trained on one dataset and tested on another without any transfer learning. To explain this unusual performance, we need to determine how CNNs estimate homography. In this study, we first show the domain shift immunity of different deep homography estimation models. We then use a shallow network with a specially designed dataset to analyze the features used for estimation. The results show that networks use low-level texture information to estimate homography. We also design some experiments to compare the performance between different texture densities and image features distorted on some common datasets to demonstrate our findings. Based on these findings, we provide an explanation of the domain shift immunity of deep homography estimation.
Abstract:Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.
Abstract:Optimal Mass Transport (OMT) is a well studied problem with a variety of applications in a diverse set of fields ranging from Physics to Computer Vision and in particular Statistics and Data Science. Since the original formulation of Monge in 1781 significant theoretical progress been made on the existence, uniqueness and properties of the optimal transport maps. The actual numerical computation of the transport maps, particularly in high dimensions, remains a challenging problem. By Brenier's theorem, the continuous OMT problem can be reduced to that of solving a non-linear PDE of Monge-Ampere type whose solution is a convex function. In this paper, building on recent developments of input convex neural networks and physics informed neural networks for solving PDE's, we propose a Deep Learning approach to solve the continuous OMT problem. To demonstrate the versatility of our framework we focus on the ubiquitous density estimation and generative modeling tasks in statistics and machine learning. Finally as an example we show how our framework can be incorporated with an autoencoder to estimate an effective probabilistic generative model.
Abstract:Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the treated tissue immediately after MRgFUS procedures. Although current clinical assessment uses the immediate nonperfused volume (NPV) biomarker derived from contrast enhanced imaging, the use of contrast agent prevents continuing MRgFUS treatment if margins are not adequate. In addition, the NPV has been shown to provide variable accuracy for the true treatment outcome as evaluated by follow-up biomarkers. This work presents a novel, noncontrast, learned multiparametric MR biomarker that is conducive for intratreatment assessment. MRgFUS ablations were performed in a rabbit VX2 tumor model. Multiparametric MRI was obtained both during and immediately after the MRgFUS ablation, as well as during follow-up imaging. Segmentation of the NPV obtained during follow-up imaging was used to train a neural network on noncontrast multiparametric MR images. The NPV follow-up segmentation was registered to treatment-day images using a novel volume-conserving registration algorithm, allowing a voxel-wise correlation between imaging sessions. Contrasted with state-of-the-art registration algorithms that change the average volume by 16.8%, the presented volume-conserving registration algorithm changes the average volume by only 0.28%. After registration, the learned multiparametric MR biomarker predicted the follow-up NPV with an average DICE coefficient of 0.71, outperforming the DICE coefficient of 0.53 from the current standard of NPV obtained immediately after the ablation treatment. Noncontrast multiparametric MR imaging can provide a more accurate prediction of treated tissue immediately after treatment. Noncontrast assessment of MRgFUS procedures will potentially lead to more efficacious MRgFUS ablation treatments.
Abstract:Radiation therapy has presented a need for dynamic tracking of a target tumor volume. Fiducial markers such as implanted gold seeds have been used to gate radiation delivery but the markers are invasive and gating significantly increases treatment time. Pretreatment acquisition of a 4DCT allows for the development of accurate motion estimation for treatment planning. A deep convolutional neural network and subspace motion tracking is used to recover anatomical positions from a single radiograph projection in real-time. We approximate the nonlinear inverse of a diffeomorphic transformation composed with radiographic projection as a deep network that produces subspace coordinates to define the patient-specific deformation of the lungs from a baseline anatomic position. The geometric accuracy of the subspace projections on real patient data is similar to accuracy attained by original image registration between individual respiratory-phase image volumes.
Abstract:Given data, deep generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), train a lower dimensional latent representation of the data space. The linear Euclidean geometry of data space pulls back to a nonlinear Riemannian geometry on the latent space. The latent space thus provides a low-dimensional nonlinear representation of data and classical linear statistical techniques are no longer applicable. In this paper we show how statistics of data in their latent space representation can be performed using techniques from the field of nonlinear manifold statistics. Nonlinear manifold statistics provide generalizations of Euclidean statistical notions including means, principal component analysis, and maximum likelihood fits of parametric probability distributions. We develop new techniques for maximum likelihood inference in latent space, and adress the computational complexity of using geometric algorithms with high-dimensional data by training a separate neural network to approximate the Riemannian metric and cometric tensor capturing the shape of the learned data manifold.
Abstract:We present an inference algorithm and connected Monte Carlo based estimation procedures for metric estimation from landmark configurations distributed according to the transition distribution of a Riemannian Brownian motion arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric. The distribution possesses properties similar to the regular Euclidean normal distribution but its transition density is governed by a high-dimensional PDE with no closed-form solution in the nonlinear case. We show how the density can be numerically approximated by Monte Carlo sampling of conditioned Brownian bridges, and we use this to estimate parameters of the LDDMM kernel and thus the metric structure by maximum likelihood.
Abstract:In this paper we develop the theory of parametric polynomial regression in Riemannian manifolds and Lie groups. We show application of Riemannian polynomial regression to shape analysis in Kendall shape space. Results are presented, showing the power of polynomial regression on the classic rat skull growth data of Bookstein as well as the analysis of the shape changes associated with aging of the corpus callosum from the OASIS Alzheimer's study.