Abstract:Combining CNN with CRF for modeling dependencies between pixel labels is a popular research direction. This task is far from trivial, especially if end-to-end training is desired. In this paper, we propose a novel simple approach to CNN+CRF combination. In particular, we propose to simulate a CRF regularizer with a trainable module that has standard CNN architecture. We call this module a CRF Simulator. We can automatically generate an unlimited amount of ground truth for training such CRF Simulator without any user interaction, provided we have an efficient algorithm for optimization of the actual CRF regularizer. After our CRF Simulator is trained, it can be directly incorporated as part of any larger CNN architecture, enabling a seamless end-to-end training. In particular, the other modules can learn parameters that are more attuned to the performance of the CRF Simulator module. We demonstrate the effectiveness of our approach on the task of salient object segmentation regularized with the standard binary CRF energy. In contrast to previous work we do not need to develop and implement the complex mechanics of optimizing a specific CRF as part of CNN. In fact, our approach can be easily extended to other CRF energies, including multi-label. To the best of our knowledge we are the first to study the question of whether the output of CNNs can have regularization properties of CRFs.
Abstract:Curvature has received increased attention as an important alternative to length based regularization in computer vision. In contrast to length, it preserves elongated structures and fine details. Existing approaches are either inefficient, or have low angular resolution and yield results with strong block artifacts. We derive a new model for computing squared curvature based on integral geometry. The model counts responses of straight line triple cliques. The corresponding energy decomposes into submodular and supermodular pairwise potentials. We show that this energy can be efficiently minimized even for high angular resolutions using the trust region framework. Our results confirm that we obtain accurate and visually pleasing solutions without strong artifacts at reasonable run times.
Abstract:Many computer vision problems require optimization of binary non-submodular energies. We propose a general optimization framework based on local submodular approximations (LSA). Unlike standard LP relaxation methods that linearize the whole energy globally, our approach iteratively approximates the energies locally. On the other hand, unlike standard local optimization methods (e.g. gradient descent or projection techniques) we use non-linear submodular approximations and optimize them without leaving the domain of integer solutions. We discuss two specific LSA algorithms based on "trust region" and "auxiliary function" principles, LSA-TR and LSA-AUX. These methods obtain state-of-the-art results on a wide range of applications outperforming many standard techniques such as LBP, QPBO, and TRWS. While our paper is focused on pairwise energies, our ideas extend to higher-order problems. The code is available online (http://vision.csd.uwo.ca/code/).
Abstract:High-order (non-linear) functionals have become very popular in segmentation, stereo and other computer vision problems. Level sets is a well established general gradient descent framework, which is directly applicable to optimization of such functionals and widely used in practice. Recently, another general optimization approach based on trust region methodology was proposed for regional non-linear functionals. Our goal is a comprehensive experimental comparison of these two frameworks in regard to practical efficiency, robustness to parameters, and optimality. We experiment on a wide range of problems with non-linear constraints on segment volume, appearance and shape.