Abstract:We present GSLoc: a new visual localization method that performs dense camera alignment using 3D Gaussian Splatting as a map representation of the scene. GSLoc backpropagates pose gradients over the rendering pipeline to align the rendered and target images, while it adopts a coarse-to-fine strategy by utilizing blurring kernels to mitigate the non-convexity of the problem and improve the convergence. The results show that our approach succeeds at visual localization in challenging conditions of relatively small overlap between initial and target frames inside textureless environments when state-of-the-art neural sparse methods provide inferior results. Using the byproduct of realistic rendering from the 3DGS map representation, we show how to enhance localization results by mixing a set of observed and virtual reference keyframes when solving the image retrieval problem. We evaluate our method both on synthetic and real-world data, discussing its advantages and application potential.
Abstract:We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner. We introduce three components. First, we suggest employing normal maps as a pure geometric representation for SDS instead of color renderings which are entangled with the appearance information. Second, we introduce the freezing of the SDS noise during training which results in more coherent gradients and better convergence. Third, we propose Multi-View SDS as a way to condition the generation of the non-observable part of the surface without fine-tuning or making changes to the underlying 2D Stable Diffusion model. We evaluate our approach on the BlendedMVS dataset demonstrating significant qualitative and quantitative improvements over competing methods.
Abstract:In this work we present a novel optimization strategy for image reconstruction tasks under analysis-based image regularization, which promotes sparse and/or low-rank solutions in some learned transform domain. We parameterize such regularizers using potential functions that correspond to weighted extensions of the $\ell_p^p$-vector and $\mathcal{S}_p^p$ Schatten-matrix quasi-norms with $0 < p \le 1$. Our proposed minimization strategy extends the Iteratively Reweighted Least Squares (IRLS) method, typically used for synthesis-based $\ell_p$ and $\mathcal{S}_p$ norm and analysis-based $\ell_1$ and nuclear norm regularization. We prove that under mild conditions our minimization algorithm converges linearly to a stationary point, and we provide an upper bound for its convergence rate. Further, to select the parameters of the regularizers that deliver the best results for the problem at hand, we propose to learn them from training data by formulating the supervised learning process as a stochastic bilevel optimization problem. We show that thanks to the convergence guarantees of our proposed minimization strategy, such optimization can be successfully performed with a memory-efficient implicit back-propagation scheme. We implement our learned IRLS variants as recurrent networks and assess their performance on the challenging image reconstruction tasks of non-blind deblurring, super-resolution and demosaicking. The comparisons against other existing learned reconstruction approaches demonstrate that our overall method is very competitive and in many cases outperforms existing unrolled networks, whose number of parameters is orders of magnitude higher than in our case.
Abstract:We introduce a novel optimization algorithm for image recovery under learned sparse and low-rank constraints, which we parameterize as weighted extensions of the $\ell_p^p$-vector and $\mathcal S_p^p$ Schatten-matrix quasi-norms for $0\!<p\!\le1$, respectively. Our proposed algorithm generalizes the Iteratively Reweighted Least Squares (IRLS) method, used for signal recovery under $\ell_1$ and nuclear-norm constrained minimization. Further, we interpret our overall minimization approach as a recurrent network that we then employ to deal with inverse low-level computer vision problems. Thanks to the convergence guarantees that our IRLS strategy offers, we are able to train the derived reconstruction networks using a memory-efficient implicit back-propagation scheme, which does not pose any restrictions on their effective depth. To assess our networks' performance, we compare them against other existing reconstruction methods on several inverse problems, namely image deblurring, super-resolution, demosaicking and sparse recovery. Our reconstruction results are shown to be very competitive and in many cases outperform those of existing unrolled networks, whose number of parameters is orders of magnitude higher than that of our learned models.
Abstract:In this work, we study the problem of non-blind image deconvolution and propose a novel recurrent network architecture that leads to very competitive restoration results of high image quality. Motivated by the computational efficiency and robustness of existing large scale linear solvers, we manage to express the solution to this problem as the solution of a series of adaptive non-negative least-squares problems. This gives rise to our proposed Recurrent Least Squares Deconvolution Network (RLSDN) architecture, which consists of an implicit layer that imposes a linear constraint between its input and output. By design, our network manages to serve two important purposes simultaneously. The first is that it implicitly models an effective image prior that can adequately characterize the set of natural images, while the second is that it recovers the corresponding maximum a posteriori (MAP) estimate. Experiments on publicly available datasets, comparing recent state-of-the-art methods, show that our proposed RLSDN approach achieves the best reported performance both for grayscale and color images for all tested scenarios. Furthermore, we introduce a novel training strategy that can be adopted by any network architecture that involves the solution of linear systems as part of its pipeline. Our strategy eliminates completely the need to unroll the iterations required by the linear solver and, thus, it reduces significantly the memory footprint during training. Consequently, this enables the training of deeper network architectures which can further improve the reconstruction results.
Abstract:Microscopy is a powerful visualization tool in biology, enabling the study of cells, tissues, and the fundamental biological processes. Yet, the observed images of the objects at the micro-scale suffer from two major inherent distortions: the blur caused by the diffraction of light, and the background noise caused by the imperfections of the imaging detectors. The latter is especially severe in fluorescence and in confocal microscopes, which are known for operating at the low photon count with the Poisson noise statistics. Restoration of such images is usually accomplished by image deconvolution, with the nature of the noise statistics taken into account, and by solving an optimization problem given some prior information about the underlying data (i.e., regularization). In this work, we propose a unifying framework of algorithms for Poisson image deblurring and denoising. The algorithms are based on deep learning techniques for the design of learnable regularizers paired with an appropriate optimization scheme. Our extensive experimentation line showcases that the proposed approach achieves superior quality of image reconstruction and beats the solutions that rely on deep learning or on the optimization schemes alone. Moreover, several implementations of the proposed framework demonstrate competitive performance at a low computational complexity, which is of high importance for real-time imaging applications.
Abstract:Modern inexpensive imaging sensors suffer from inherent hardware constraints which often result in captured images of poor quality. Among the most common ways to deal with such limitations is to rely on burst photography, which nowadays acts as the backbone of all modern smartphone imaging applications. In this work, we focus on the fact that every frame of a burst sequence can be accurately described by a forward (physical) model. This in turn allows us to restore a single image of higher quality from a sequence of low quality images as the solution of an optimization problem. Inspired by an extension of the gradient descent method that can handle non-smooth functions, namely the proximal gradient descent, and modern deep learning techniques, we propose a convolutional iterative network with a transparent architecture. Our network, uses a burst of low quality image frames and is able to produce an output of higher image quality recovering fine details which are not distinguishable in any of the original burst frames. We focus both on the burst photography pipeline as a whole, i.e. burst demosaicking and denoising, as well as on the traditional Gaussian denoising task. The developed method demonstrates consistent state-of-the art performance across the two tasks and as opposed to other recent deep learning approaches does not have any inherent restrictions either to the number of frames or their ordering.
Abstract:Modern digital cameras rely on sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information are missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to tackle this problem, in this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization and deep learning techniques. Consequently, our derived iterative neural network has a transparent and clear interpretation compared to other black-box data driven approaches. The extensive comparisons that we report demonstrate the superiority of our proposed network, which outperforms any previous approaches on both noisy and noise-free data across many different datasets using less training samples. This improvement in reconstruction quality is attributed to the principled way we design and train our network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution.
Abstract:Demosaicking and denoising are among the most crucial steps of modern digital camera pipelines and their joint treatment is a highly ill-posed inverse problem where at-least two-thirds of the information are missing and the rest are corrupted by noise. This poses a great challenge in obtaining meaningful reconstructions and a special care for the efficient treatment of the problem is required. While there are several machine learning approaches that have been recently introduced to deal with joint image demosaicking-denoising, in this work we propose a novel deep learning architecture which is inspired by powerful classical image regularization methods and large-scale convex optimization techniques. Consequently, our derived network is more transparent and has a clear interpretation compared to alternative competitive deep learning approaches. Our extensive experiments demonstrate that our network outperforms any previous approaches on both noisy and noise-free data. This improvement in reconstruction quality is attributed to the principled way we design our network architecture, which also requires fewer trainable parameters than the current state-of-the-art deep network solution. Finally, we show that our network has the ability to generalize well even when it is trained on small datasets, while keeping the overall number of trainable parameters low.
Abstract:We design a novel network architecture for learning discriminative image models that are employed to efficiently tackle the problem of grayscale and color image denoising. Based on the proposed architecture, we introduce two different variants. The first network involves convolutional layers as a core component, while the second one relies instead on non-local filtering layers and thus it is able to exploit the inherent non-local self-similarity property of natural images. As opposed to most of the existing deep network approaches, which require the training of a specific model for each considered noise level, the proposed models are able to handle a wide range of noise levels using a single set of learned parameters, while they are very robust when the noise degrading the latent image does not match the statistics of the noise used during training. The latter argument is supported by results that we report on publicly available images corrupted by unknown noise and which we compare against solutions obtained by competing methods. At the same time the introduced networks achieve excellent results under additive white Gaussian noise (AWGN), which are comparable to those of the current state-of-the-art network, while they depend on a more shallow architecture with the number of trained parameters being one order of magnitude smaller. These properties make the proposed networks ideal candidates to serve as sub-solvers on restoration methods that deal with general inverse imaging problems such as deblurring, demosaicking, superresolution, etc.