Abstract:Renewable resources are strongly dependent on local and large-scale weather situations. Skillful subseasonal to seasonal (S2S) forecasts -- beyond two weeks and up to two months -- can offer significant socioeconomic advantages to the energy sector. This study aims to enhance wind speed predictions using a diffusion model with classifier-free guidance to downscale S2S forecasts of surface wind speed. We propose DiffScale, a diffusion model that super-resolves spatial information for continuous downscaling factors and lead times. Leveraging weather priors as guidance for the generative process of diffusion models, we adopt the perspective of conditional probabilities on sampling super-resolved S2S forecasts. We aim to directly estimate the density associated with the target S2S forecasts at different spatial resolutions and lead times without auto-regression or sequence prediction, resulting in an efficient and flexible model. Synthetic experiments were designed to super-resolve wind speed S2S forecasts from the European Center for Medium-Range Weather Forecast (ECMWF) from a coarse resolution to a finer resolution of ERA5 reanalysis data, which serves as a high-resolution target. The innovative aspect of DiffScale lies in its flexibility to downscale arbitrary scaling factors, enabling it to generalize across various grid resolutions and lead times -without retraining the model- while correcting model errors, making it a versatile tool for improving S2S wind speed forecasts. We achieve a significant improvement in prediction quality, outperforming baselines up to week 3.
Abstract:Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks.
Abstract:Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation mapping mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors.
Abstract:Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with minimum low-dose, Convolutional Neural Networks (CNNs) based methods have been developed for high quality PET synthesis from its low-dose counterparts. Previous CNNs-based studies usually directly map low-dose PET into features space without consideration of different dose reduction level. In this study, a novel approach named CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) is presented. Specifically, a multi-tasking coarse generator, guided by a classification head, allows for a more comprehensive understanding of the noise-level features present in the low-dose data, resulting in improved image synthesis. Moreover, to recover spatial details of standard PET, an auxiliary super resolution network - Contextual-Net - is proposed as a second-stage training to narrow the gap between coarse prediction and standard PET. We compared our method to the state-of-the-art methods on whole-body PET with different dose reduction factors (DRFs). Experiments demonstrate our method can outperform others on all DRF.