Abstract:We present a framework for interactive design of new image stylizations using a wide range of predefined filter blocks. Both novel and off-the-shelf image filtering and rendering techniques are extended and combined to allow the user to unleash their creativity to intuitively invent, modify, and tune new styles from a given set of filters. In parallel to this manual design, we propose a novel procedural approach that automatically assembles sequences of filters, leading to unique and novel styles. An important aim of our framework is to allow for interactive exploration and design, as well as to enable videos and camera streams to be stylized on the fly. In order to achieve this real-time performance, we use the \textit{Best Linear Adaptive Enhancement} (BLADE) framework -- an interpretable shallow machine learning method that simulates complex filter blocks in real time. Our representative results include over a dozen styles designed using our interactive tool, a set of styles created procedurally, and new filters trained with our BLADE approach.
Abstract:In this work, we broadly connect kernel-based filtering (e.g. approaches such as the bilateral filters and nonlocal means, but also many more) with general variational formulations of Bayesian regularized least squares, and the related concept of proximal operators. The latter set of variational/Bayesian/proximal formulations often result in optimization problems that do not have closed-form solutions, and therefore typically require global iterative solutions. Our main contribution here is to establish how one can approximate the solution of the resulting global optimization problems with use of locally adaptive filters with specific kernels. Our results are valid for small regularization strength but the approach is powerful enough to be useful for a wide range of applications because we expose how to derive a "kernelized" solution to these problems that approximates the global solution in one-shot, using only local operations. As another side benefit in the reverse direction, given a local data-adaptive filter constructed with a particular choice of kernel, we enable the interpretation of such filters in the variational/Bayesian/proximal framework.
Abstract:Denoising is a fundamental imaging problem. Versatile but fast filtering has been demanded for mobile camera systems. We present an approach to multiscale filtering which allows real-time applications on low-powered devices. The key idea is to learn a set of kernels that upscales, filters, and blends patches of different scales guided by local structure analysis. This approach is trainable so that learned filters are capable of treating diverse noise patterns and artifacts. Experimental results show that the presented approach produces comparable results to state-of-the-art algorithms while processing time is orders of magnitude faster.
Abstract:The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of problems in computational photography. We show applications to operations which may appear in a camera pipeline including denoising, demosaicing, and stylization.
Abstract:Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.