Super-resolution is a task in computer vision that involves increasing the resolution of an image or video by generating missing high-frequency details from low-resolution input. The goal is to produce an output image with a higher resolution than the input image, while preserving the original content and structure.
High-resolution satellite imagery is indispensable for tracking the genesis, intensification, and trajectory of tropical cyclones (TCs). However, existing deep learning-based super-resolution (SR) methods often treat satellite image sequences as generic videos, neglecting the underlying atmospheric physical laws governing cloud motion. To address this, we propose a Physics Encoded Spatial and Temporal Generative Adversarial Network (PESTGAN) for TC image super-resolution. Specifically, we design a disentangled generator architecture incorporating a PhyCell module, which approximates the vorticity equation via constrained convolutions and encodes the resulting approximate physical dynamics as implicit latent representations to separate physical dynamics from visual textures. Furthermore, a dual-discriminator framework is introduced, employing a temporal discriminator to enforce motion consistency alongside spatial realism. Experiments on the Digital Typhoon dataset for 4$\times$ upscaling demonstrate that PESTGAN establishes a better performance in structural fidelity and perceptual quality. While maintaining competitive pixel-wise accuracy compared to existing approaches, our method significantly excels in reconstructing meteorologically plausible cloud structures with superior physical fidelity.
General-purpose super-resolution models, particularly Vision Transformers, have achieved remarkable success but exhibit fundamental inefficiencies in common infrared imaging scenarios like surveillance and autonomous driving, which operate from fixed or nearly-static viewpoints. These models fail to exploit the strong, persistent spatial priors inherent in such scenes, leading to redundant learning and suboptimal performance. To address this, we propose the Regional Prior attention Transformer for infrared image Super-Resolution (RPT-SR), a novel architecture that explicitly encodes scene layout information into the attention mechanism. Our core contribution is a dual-token framework that fuses (1) learnable, regional prior tokens, which act as a persistent memory for the scene's global structure, with (2) local tokens that capture the frame-specific content of the current input. By utilizing these tokens into an attention, our model allows the priors to dynamically modulate the local reconstruction process. Extensive experiments validate our approach. While most prior works focus on a single infrared band, we demonstrate the broad applicability and versatility of RPT-SR by establishing new state-of-the-art performance across diverse datasets covering both Long-Wave (LWIR) and Short-Wave (SWIR) spectra
Super-resolution is widely used in medical imaging to enhance low-quality data, reducing scan time and improving abnormality detection. Conventional super-resolution approaches typically rely on paired datasets of downsampled and original high resolution images, training models to reconstruct high resolution images from their artificially degraded counterparts. However, in real-world clinical settings, low resolution data often arise from acquisition mechanisms that differ significantly from simple downsampling. As a result, these inputs may lie outside the domain of the training data, leading to poor model generalization due to domain shift. To address this limitation, we propose a distributional deep learning framework that improves model robustness and domain generalization. We develop this approch for enhancing the resolution of 4D Flow MRI (4DF). This is a novel imaging modality that captures hemodynamic flow velocity and clinically relevant metrics such as vessel wall stress. These metrics are critical for assessing aneurysm rupture risk. Our model is initially trained on high resolution computational fluid dynamics (CFD) simulations and their downsampled counterparts. It is then fine-tuned on a small, harmonized dataset of paired 4D Flow MRI and CFD samples. We derive the theoretical properties of our distributional estimators and demonstrate that our framework significantly outperforms traditional deep learning approaches through real data applications. This highlights the effectiveness of distributional learning in addressing domain shift and improving super-resolution performance in clinically realistic scenarios.
Extra-large apertures, high carrier frequencies, and integrated sensing and communications (ISAC) are pushing array processing into the Fresnel region, where spherical wavefronts induce a range-dependent phase across the aperture. This curvature breaks the Fourier/Vandermonde structure behind classical subspace methods, and it is especially limiting with hybrid front-ends that provide only a small number of pilot measurements. Consequently, practical systems need continuous angle resolution and joint angle-range inference where many near-field approaches still rely on costly 2D gridding. We show that convexity can meet curvature via a lifted, gridless superresolution framework for near-field measurements. The key is a Bessel-Vandermonde factorization of the Fresnel-phase manifold that exposes a hidden Vandermonde structure in angle while isolating the range dependence into a compact coefficient map. Building on this, we introduce a lifting that maps each range bin and continuous angle to a structured rank-one atom, converting the nonlinear near-field model into a linear inverse problem over a row-sparse matrix. Recovery is posed as atomic-norm minimization and an explicit dual characterization via bounded trigonometric polynomials yields certificate-based localization that super-resolves off-grid angles and identifies active range bins. Simulations with strongly undersampled hybrid observations validate reliable joint angle-range recovery for next-generation wireless and ISAC systems.
Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly used datasets do not accurately reflect the characteristics of streaming media, limiting the relevance of current benchmarks. To address this gap, we introduce a comprehensive dataset - StreamSR - sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios. We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case. Furthermore, we propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function - a novel design choice in the context of real-time super-resolution. We extensively optimized the architecture to maximize efficiency and designed a composite loss function that improves training convergence. EfRLFN combines the strengths of existing architectures while improving both visual quality and runtime performance. Finally, we show that fine-tuning other models on our dataset results in significant performance gains that generalize well across various standard benchmarks. We made the dataset, the code, and the benchmark available at https://github.com/EvgeneyBogatyrev/EfRLFN.
Implicit neural representations (INRs) have emerged as powerful tools for encoding signals, yet dominant MLP-based designs often suffer from slow convergence, overfitting to noise, and poor extrapolation. We introduce FUTON (Fourier Tensor Network), which models signals as generalized Fourier series whose coefficients are parameterized by a low-rank tensor decomposition. FUTON implicitly expresses signals as weighted combinations of orthonormal, separable basis functions, combining complementary inductive biases: Fourier bases capture smoothness and periodicity, while the low-rank parameterization enforces low-dimensional spectral structure. We provide theoretical guarantees through a universal approximation theorem and derive an inference algorithm with complexity linear in the spectral resolution and the input dimension. On image and volume representation, FUTON consistently outperforms state-of-the-art MLP-based INRs while training 2--5$\times$ faster. On inverse problems such as image denoising and super-resolution, FUTON generalizes better and converges faster.
Ill-posed imaging inverse problems remain challenging due to the ambiguity in mapping degraded observations to clean images. Diffusion-based generative priors have recently shown promise, but typically rely on computationally intensive iterative sampling or per-instance optimization. Amortized variational inference frameworks address this inefficiency by learning a direct mapping from measurements to posteriors, enabling fast posterior sampling without requiring the optimization of a new posterior for every new set of measurements. However, they still struggle to reconstruct fine details and complex textures. To address this, we extend the amortized framework by injecting spatially adaptive perturbations to measurements during training, guided by uncertainty estimates, to emphasize learning in the most uncertain regions. Experiments on deblurring and super-resolution demonstrate that our method achieves superior or competitive performance to previous diffusion-based approaches, delivering more realistic reconstructions without the computational cost of iterative refinement.
Super-resolution (SR) applied to real-world low-resolution (LR) images often results in complex, irregular degradations that stem from the inherent complexity of natural scene acquisition. In contrast to SR artifacts arising from synthetic LR images created under well-defined scenarios, those distortions are highly unpredictable and vary significantly across different real-life contexts. Consequently, assessing the quality of SR images (SR-IQA) obtained from realistic LR, remains a challenging and underexplored problem. In this work, we introduce a no-reference SR-IQA approach tailored for such highly ill-posed realistic settings. The proposed method enables domain-adaptive IQA for real-world SR applications, particularly in data-scarce domains. We hypothesize that degradations in super-resolved images are strongly dependent on the underlying SR algorithms, rather than being solely determined by image content. To this end, we introduce a self-supervised learning (SSL) strategy that first pretrains multiple SR model oriented representations in a pretext stage. Our contrastive learning framework forms positive pairs from images produced by the same SR model and negative pairs from those generated by different methods, independent of image content. The proposed approach S3 RIQA, further incorporates targeted preprocessing to extract complementary quality information and an auxiliary task to better handle the various degradation profiles associated with different SR scaling factors. To this end, we constructed a new dataset, SRMORSS, to support unsupervised pretext training; it includes a wide range of SR algorithms applied to numerous real LR images, which addresses a gap in existing datasets. Experiments on real SR-IQA benchmarks demonstrate that S3 RIQA consistently outperforms most state-of-the-art relevant metrics.
Portable, ultra-low-field (ULF) magnetic resonance imaging has the potential to expand access to neuroimaging but currently suffers from coarse spatial and angular resolutions and low signal-to-noise ratios. Diffusion tensor imaging (DTI), a sequence tailored to detect and reconstruct white matter tracts within the brain, is particularly prone to such imaging degradation due to inherent sequence design coupled with prolonged scan times. In addition, ULF DTI scans exhibit artifacting that spans both the space and angular domains, requiring a custom modelling algorithm for subsequent correction. We introduce a nine-direction, single-shell ULF DTI sequence, as well as a companion Bayesian bias field correction algorithm that possesses angular dependence and convolutional neural network-based superresolution algorithm that is generalizable across DTI datasets and does not require re-training (''DiffSR''). We show through a synthetic downsampling experiment and white matter assessment in real, matched ULF and high-field DTI scans that these algorithms can recover microstructural and volumetric white matter information at ULF. We also show that DiffSR can be directly applied to white matter-based Alzheimers disease classification in synthetically degraded scans, with notable improvements in agreement between DTI metrics, as compared to un-degraded scans. We freely disseminate the Bayesian bias correction algorithm and DiffSR with the goal of furthering progress on both ULF reconstruction methods and general DTI sequence harmonization. We release all code related to DiffSR for $\href{https://github.com/markolchanyi/DiffSR}{public \space use}$.
We propose AdaDS, a generalizable framework for depth super-resolution that robustly recovers high-resolution depth maps from arbitrarily degraded low-resolution inputs. Unlike conventional approaches that directly regress depth values and often exhibit artifacts under severe or unknown degradation, AdaDS capitalizes on the contraction property of Gaussian smoothing: as noise accumulates in the forward process, distributional discrepancies between degraded inputs and their pristine high-quality counterparts diminish, ultimately converging to isotropic Gaussian prior. Leveraging this, AdaDS adaptively selects a starting timestep in the reverse diffusion trajectory based on estimated refinement uncertainty, and subsequently injects tailored noise to position the intermediate sample within the high-probability region of the target posterior distribution. This strategy ensures inherent robustness, enabling generative prior of a pre-trained diffusion model to dominate recovery even when upstream estimations are imperfect. Extensive experiments on real-world and synthetic benchmarks demonstrate AdaDS's superior zero-shot generalization and resilience to diverse degradation patterns compared to state-of-the-art methods.