Abstract:Deep neural networks trained using gradient descent with a fixed learning rate $\eta$ often operate in the regime of "edge of stability" (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/\eta$. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically to zero. Overall, our results contribute to explaining two key phenomena in deep networks: (i) shallow models and simple tasks do not always exhibit EOS; and (ii) oscillations occur within top features. We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscape such as in DLNs.
Abstract:In recent years, deep learning methods have been extensively developed for inverse imaging problems (IIPs), encompassing supervised, self-supervised, and generative approaches. Most of these methods require large amounts of labeled or unlabeled training data to learn effective models. However, in many practical applications, such as medical image reconstruction, extensive training datasets are often unavailable or limited. A significant milestone in addressing this challenge came in 2018 with the work of Ulyanov et al., which introduced the Deep Image Prior (DIP)--the first training-data-free neural network method for IIPs. Unlike conventional deep learning approaches, DIP requires only a convolutional neural network, the noisy measurements, and a forward operator. By leveraging the implicit regularization of deep networks initialized with random noise, DIP can learn and restore image structures without relying on external datasets. However, a well-known limitation of DIP is its susceptibility to overfitting, primarily due to the over-parameterization of the network. In this tutorial paper, we provide a comprehensive review of DIP, including a theoretical analysis of its training dynamics. We also categorize and discuss recent advancements in DIP-based methods aimed at mitigating overfitting, including techniques such as regularization, network re-parameterization, and early stopping. Furthermore, we discuss approaches that combine DIP with pre-trained neural networks, present empirical comparison results against data-centric methods, and highlight open research questions and future directions.
Abstract:Robust tensor principal component analysis (RTPCA) aims to separate the low-rank and sparse components from multi-dimensional data, making it an essential technique in the signal processing and computer vision fields. Recently emerging tensor singular value decomposition (t-SVD) has gained considerable attention for its ability to better capture the low-rank structure of tensors compared to traditional matrix SVD. However, existing methods often rely on the computationally expensive tensor nuclear norm (TNN), which limits their scalability for real-world tensors. To address this issue, we explore an efficient scaled gradient descent (SGD) approach within the t-SVD framework for the first time, and propose the RTPCA-SGD method. Theoretically, we rigorously establish the recovery guarantees of RTPCA-SGD under mild assumptions, demonstrating that with appropriate parameter selection, it achieves linear convergence to the true low-rank tensor at a constant rate, independent of the condition number. To enhance its practical applicability, we further propose a learnable self-supervised deep unfolding model, which enables effective parameter learning. Numerical experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed methods while maintaining competitive computational efficiency, especially consuming less time than RTPCA-TNN.
Abstract:Deep learning methods are highly effective for many image reconstruction tasks. However, the performance of supervised learned models can degrade when applied to distinct experimental settings at test time or in the presence of distribution shifts. In this study, we demonstrate that pruning deep image reconstruction networks at training time can improve their robustness to distribution shifts. In particular, we consider unrolled reconstruction architectures for accelerated magnetic resonance imaging and introduce a method for pruning unrolled networks (PUN) at initialization. Our experiments demonstrate that when compared to traditional dense networks, PUN offers improved generalization across a variety of experimental settings and even slight performance gains on in-distribution data.
Abstract:Deep learning (DL) methods have been extensively applied to various image recovery problems, including magnetic resonance imaging (MRI) and computed tomography (CT) reconstruction. Beyond supervised models, other approaches have been recently explored including two key recent schemes: Deep Image Prior (DIP) that is an unsupervised scan-adaptive method that leverages the network architecture as implicit regularization but can suffer from noise overfitting, and diffusion models (DMs), where the sampling procedure of a pre-trained generative model is modified to allow sampling from the measurement-conditioned distribution through approximations. In this paper, we propose combining DIP and DMs for MRI and CT reconstruction, motivated by (i) the impact of the DIP network input and (ii) the use of DMs as diffusion purifiers (DPs). Specifically, we propose an unrolled procedure that iteratively optimizes the DIP network with a DM-refined adaptive input using a loss with data consistency and autoencoding terms. We term the approach unrolled Diffusion-Guided DIP (uDiG-DIP). Our experimental results demonstrate that uDiG-DIP achieves superior reconstruction results compared to leading DM-based baselines and the original DIP for MRI and CT tasks.
Abstract:Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such as in the presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier sampling steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces the standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions, either implicitly or explicitly, our sampler requires significantly fewer reverse steps. Therefore, we refer to our accelerated method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to existing state-of-the-art baseline methods, under different levels of measurement noise, our extensive experiments across five linear and three non-linear image restoration tasks demonstrate that SITCOM achieves competitive or superior results in terms of standard image similarity metrics while requiring a significantly reduced run-time across all considered tasks.
Abstract:Density reconstruction from X-ray projections is an important problem in radiography with key applications in scientific and industrial X-ray computed tomography (CT). Often, such projections are corrupted by unknown sources of noise and scatter, which when not properly accounted for, can lead to significant errors in density reconstruction. In the setting of this problem, recent deep learning-based methods have shown promise in improving the accuracy of density reconstruction. In this article, we propose a deep learning-based encoder-decoder framework wherein the encoder extracts robust features from noisy/corrupted X-ray projections and the decoder reconstructs the density field from the features extracted by the encoder. We explore three options for the latent-space representation of features: physics-inspired supervision, self-supervision, and no supervision. We find that variants based on self-supervised and physicsinspired supervised features perform better over a range of unknown scatter and noise. In extreme noise settings, the variant with self-supervised features performs best. After investigating further details of the proposed deep-learning methods, we conclude by demonstrating that the newly proposed methods are able to achieve higher accuracy in density reconstruction when compared to a traditional iterative technique.
Abstract:We introduce Optimal Eye Surgeon (OES), a framework for pruning and training deep image generator networks. Typically, untrained deep convolutional networks, which include image sampling operations, serve as effective image priors (Ulyanov et al., 2018). However, they tend to overfit to noise in image restoration tasks due to being overparameterized. OES addresses this by adaptively pruning networks at random initialization to a level of underparameterization. This process effectively captures low-frequency image components even without training, by just masking. When trained to fit noisy images, these pruned subnetworks, which we term Sparse-DIP, resist overfitting to noise. This benefit arises from underparameterization and the regularization effect of masking, constraining them in the manifold of image priors. We demonstrate that subnetworks pruned through OES surpass other leading pruning methods, such as the Lottery Ticket Hypothesis, which is known to be suboptimal for image recovery tasks (Wu et al., 2023). Our extensive experiments demonstrate the transferability of OES-masks and the characteristics of sparse-subnetworks for image generation. Code is available at https://github.com/Avra98/Optimal-Eye-Surgeon.git.
Abstract:The ability of deep image prior (DIP) to recover high-quality images from incomplete or corrupted measurements has made it popular in inverse problems in image restoration and medical imaging including magnetic resonance imaging (MRI). However, conventional DIP suffers from severe overfitting and spectral bias effects. In this work, we first provide an analysis of how DIP recovers information from undersampled imaging measurements by analyzing the training dynamics of the underlying networks in the kernel regime for different architectures. This study sheds light on important underlying properties for DIP-based recovery. Current research suggests that incorporating a reference image as network input can enhance DIP's performance in image reconstruction compared to using random inputs. However, obtaining suitable reference images requires supervision, and raises practical difficulties. In an attempt to overcome this obstacle, we further introduce a self-driven reconstruction process that concurrently optimizes both the network weights and the input while eliminating the need for training data. Our method incorporates a novel denoiser regularization term which enables robust and stable joint estimation of both the network input and reconstructed image. We demonstrate that our self-guided method surpasses both the original DIP and modern supervised methods in terms of MR image reconstruction performance and outperforms previous DIP-based schemes for image inpainting.
Abstract:Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.