Abstract:Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods. Further, we identify robustness short-comings of these approaches and propose two intervention techniques leading to $1.5\times$-$4\times$ robustness improvements on three datasets, AudioSet, Kinetics-400 and ImageNet-Captions. Finally, we demonstrate that these interventions better utilize additional modalities, if present, to achieve competitive results of $44.2$ mAP on AudioSet 20K.
Abstract:Recovering high-quality images from limited sensory data is a challenging computer vision problem that has received significant attention in recent years. In particular, solutions based on deep learning, ranging from autoencoders to generative models, have been especially effective. However, comparatively little work has centered on the robustness of such reconstructions in terms of the generation of realistic image artifacts (known as hallucinations) and quantifying uncertainty. In this work, we develop experimental methods to address these concerns, utilizing a variational autoencoder-based generative adversarial network (VAE-GAN) as a probabilistic image recovery algorithm. We evaluate the model's output distribution statistically by exploring the variance, bias, and error associated with generated reconstructions. Furthermore, we perform eigen analysis by examining the Jacobians of outputs with respect to the aliased inputs to more accurately determine which input components can be responsible for deteriorated output quality. Experiments were carried out using a dataset of Knee MRI images, and our results indicate factors such as sampling rate, acquisition model, and loss function impact the model's robustness. We also conclude that a wise choice of hyperparameters can lead to the robust recovery of MRI images.
Abstract:Purpose: To develop a general phase regularized image reconstruction method, with applications to partial Fourier imaging, water-fat imaging and flow imaging. Theory and Methods: The problem of enforcing phase constraints in reconstruction was studied under a regularized inverse problem framework. A general phase regularized reconstruction algorithm was proposed to enable various joint reconstruction of partial Fourier imaging, water-fat imaging and flow imaging, along with parallel imaging (PI) and compressed sensing (CS). Since phase regularized reconstruction is inherently non-convex and sensitive to phase wraps in the initial solution, a reconstruction technique, named phase cycling, was proposed to render the overall algorithm invariant to phase wraps. The proposed method was applied to retrospectively under-sampled in vivo datasets and compared with state of the art reconstruction methods. Results: Phase cycling reconstructions showed reduction of artifacts compared to reconstructions with- out phase cycling and achieved similar performances as state of the art results in partial Fourier, water-fat and divergence-free regularized flow reconstruction. Joint reconstruction of partial Fourier + water-fat imaging + PI + CS, and partial Fourier + divergence-free regularized flow imaging + PI + CS were demonstrated. Conclusion: The proposed phase cycling reconstruction provides an alternative way to perform phase regularized reconstruction, without the need to perform phase unwrapping. It is robust to the choice of initial solutions and encourages the joint reconstruction of phase imaging applications.