Abstract:The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. The goal of this challenge was to promote the development of deep generative models (DGMs) for medical imaging and to emphasize the need for their domain-relevant assessment via the analysis of relevant image statistics. As part of this Grand Challenge, a training dataset was developed based on 3D anthropomorphic breast phantoms from the VICTRE virtual imaging toolbox. A two-stage evaluation procedure consisting of a preliminary check for memorization and image quality (based on the Frechet Inception distance (FID)), and a second stage evaluating the reproducibility of image statistics corresponding to domain-relevant radiomic features was developed. A summary measure was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, and to identify various artifacts. 58 submissions from 12 unique users were received for this Challenge. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. We observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.
Abstract:Diffusion models have emerged as a popular family of deep generative models (DGMs). In the literature, it has been claimed that one class of diffusion models -- denoising diffusion probabilistic models (DDPMs) -- demonstrate superior image synthesis performance as compared to generative adversarial networks (GANs). To date, these claims have been evaluated using either ensemble-based methods designed for natural images, or conventional measures of image quality such as structural similarity. However, there remains an important need to understand the extent to which DDPMs can reliably learn medical imaging domain-relevant information, which is referred to as `spatial context' in this work. To address this, a systematic assessment of the ability of DDPMs to learn spatial context relevant to medical imaging applications is reported for the first time. A key aspect of the studies is the use of stochastic context models (SCMs) to produce training data. In this way, the ability of the DDPMs to reliably reproduce spatial context can be quantitatively assessed by use of post-hoc image analyses. Error-rates in DDPM-generated ensembles are reported, and compared to those corresponding to a modern GAN. The studies reveal new and important insights regarding the capacity of DDPMs to learn spatial context. Notably, the results demonstrate that DDPMs hold significant capacity for generating contextually correct images that are `interpolated' between training samples, which may benefit data-augmentation tasks in ways that GANs cannot.
Abstract:Quantitative phase retrieval (QPR) in propagation-based x-ray phase contrast imaging of heterogeneous and structurally complicated objects is challenging under laboratory conditions due to partial spatial coherence and polychromaticity. A learning-based method (LBM) provides a non-linear approach to this problem while not being constrained by restrictive assumptions about object properties and beam coherence. In this work, a LBM was assessed for its applicability under practical scenarios by evaluating its robustness and generalizability under typical experimental variations. Towards this end, an end-to-end LBM was employed for QPR under laboratory conditions and its robustness was investigated across various system and object conditions. The robustness of the method was tested via varying propagation distances and its generalizability with respect to object structure and experimental data was also tested. Although the LBM was stable under the studied variations, its successful deployment was found to be affected by choices pertaining to data pre-processing, network training considerations and system modeling. To our knowledge, we demonstrated for the first time, the potential applicability of an end-to-end learning-based quantitative phase retrieval method, trained on simulated data, to experimental propagation-based x-ray phase contrast measurements acquired under laboratory conditions. We considered conditions of polychromaticity, partial spatial coherence, and high noise levels, typical to laboratory conditions. This work further explored the robustness of this method to practical variations in propagation distances and object structure with the goal of assessing its potential for experimental use. Such an exploration of any LBM (irrespective of its network architecture) before practical deployment provides an understanding of its potential behavior under experimental settings.
Abstract:In recent years, generative adversarial networks (GANs) have gained tremendous popularity for potential applications in medical imaging, such as medical image synthesis, restoration, reconstruction, translation, as well as objective image quality assessment. Despite the impressive progress in generating high-resolution, perceptually realistic images, it is not clear if modern GANs reliably learn the statistics that are meaningful to a downstream medical imaging application. In this work, the ability of a state-of-the-art GAN to learn the statistics of canonical stochastic image models (SIMs) that are relevant to objective assessment of image quality is investigated. It is shown that although the employed GAN successfully learned several basic first- and second-order statistics of the specific medical SIMs under consideration and generated images with high perceptual quality, it failed to correctly learn several per-image statistics pertinent to the these SIMs, highlighting the urgent need to assess medical image GANs in terms of objective measures of image quality.
Abstract:Modern generative models, such as generative adversarial networks (GANs), hold tremendous promise for several areas of medical imaging, such as unconditional medical image synthesis, image restoration, reconstruction and translation, and optimization of imaging systems. However, procedures for establishing stochastic image models (SIMs) using GANs remain generic and do not address specific issues relevant to medical imaging. In this work, canonical SIMs that simulate realistic vessels in angiography images are employed to evaluate procedures for establishing SIMs using GANs. The GAN-based SIM is compared to the canonical SIM based on its ability to reproduce those statistics that are meaningful to the particular medically realistic SIM considered. It is shown that evaluating GANs using classical metrics and medically relevant metrics may lead to different conclusions about the fidelity of the trained GANs. This work highlights the need for the development of objective metrics for evaluating GANs.
Abstract:Generative adversarial networks are a kind of deep generative model with the potential to revolutionize biomedical imaging. This is because GANs have a learned capacity to draw whole-image variates from a lower-dimensional representation of an unknown, high-dimensional distribution that fully describes the input training images. The overarching problem with GANs in clinical applications is that there is not adequate or automatic means of assessing the diagnostic quality of images generated by GANs. In this work, we demonstrate several tests of the statistical accuracy of images output by two popular GAN architectures. We designed several stochastic object models (SOMs) of distinct features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect the known arrangement rules. We then tested the rates at which the different GANs correctly reproduced the rules under a variety of training scenarios and degrees of feature-class similarity. We found that ensembles of generated images can appear accurate visually, and correspond to low Frechet Inception Distance scores (FID), while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that while low-order ensemble statistics are largely correct, there are numerous quantifiable errors per image that plausibly can affect subsequent use of the GAN-generated images.
Abstract:In order to objectively assess new medical imaging technologies via computer-simulations, it is important to account for all sources of variability that contribute to image data. One important source of variability that can significantly limit observer performance is associated with the variability in the ensemble of objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs), which are generative models that can be employed to sample from a distribution of to-be-virtually-imaged objects. It is generally desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system, but this task has remained challenging. Deep generative neural networks, such as generative adversarial networks (GANs) hold potential for such tasks. To establish SOMs from imaging measurements, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures and GAN architectures, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a modified AmbientGAN training strategy is proposed that is suitable for modern progressive or multi-resolution training approaches such as employed in the Progressive Growing of GANs and Style-based GANs. AmbientGANs established by use of the proposed training procedure are systematically validated in a controlled way by use of computer-simulated measurement data corresponding to a stylized imaging system. Finally, emulated single-coil experimental magnetic resonance imaging data are employed to demonstrate the methods under less stylized conditions.
Abstract:Medical imaging systems are commonly assessed and optimized by use of objective-measures of image quality (IQ) that quantify the performance of an observer at specific tasks. Variation in the objects to-be-imaged is an important source of variability that can significantly limit observer performance. This object variability can be described by stochastic object models (SOMs). In order to establish SOMs that can accurately model realistic object variability, it is desirable to use experimental data. To achieve this, an augmented generative adversarial network (GAN) architecture called AmbientGAN has been developed and investigated. However, AmbientGANs cannot be immediately trained by use of advanced GAN training methods such as the progressive growing of GANs (ProGANs). Therefore, the ability of AmbientGANs to establish realistic object models is limited. To circumvent this, a progressively-growing AmbientGAN (ProAmGAN) has been proposed. However, ProAmGANs are designed for generating two-dimensional (2D) images while medical imaging modalities are commonly employed for imaging three-dimensional (3D) objects. Moreover, ProAmGANs that employ traditional generator architectures lack the ability to control specific image features such as fine-scale textures that are frequently considered when optimizing imaging systems. In this study, we address these limitations by proposing two advanced AmbientGAN architectures: 3D ProAmGANs and Style-AmbientGANs (StyAmGANs). Stylized numerical studies involving magnetic resonance (MR) imaging systems are conducted. The ability of 3D ProAmGANs to learn 3D SOMs from imaging measurements and the ability of StyAmGANs to control fine-scale textures of synthesized objects are demonstrated.
Abstract:Tomographic image reconstruction is generally an ill-posed linear inverse problem. Such ill-posed inverse problems are typically regularized using prior knowledge of the sought-after object property. Recently, deep neural networks have been actively investigated for regularizing image reconstruction problems by learning a prior for the object properties from training images. However, an analysis of the prior information learned by these deep networks and their ability to generalize to data that may lie outside the training distribution is still being explored. An inaccurate prior might lead to false structures being hallucinated in the reconstructed image and that is a cause for serious concern in medical imaging. In this work, we propose to illustrate the effect of the prior imposed by a reconstruction method by decomposing the image estimate into generalized measurement and null components. The concept of a hallucination map is introduced for the general purpose of understanding the effect of the prior in regularized reconstruction methods. Numerical studies are conducted corresponding to a stylized tomographic imaging modality. The behavior of different reconstruction methods under the proposed formalism is discussed with the help of the numerical studies.
Abstract:It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.