Abstract:We propose a learning-based depth from focus/defocus (DFF), which takes a focal stack as input for estimating scene depth. Defocus blur is a useful cue for depth estimation. However, the size of the blur depends on not only scene depth but also camera settings such as focus distance, focal length, and f-number. Current learning-based methods without any defocus models cannot estimate a correct depth map if camera settings are different at training and test times. Our method takes a plane sweep volume as input for the constraint between scene depth, defocus images, and camera settings, and this intermediate representation enables depth estimation with different camera settings at training and test times. This camera-setting invariance can enhance the applicability of learning-based DFF methods. The experimental results also indicate that our method is robust against a synthetic-to-real domain gap, and exhibits state-of-the-art performance.
Abstract:We propose a learning-based multi-view stereo (MVS) method in scattering media, such as fog or smoke, with a novel cost volume, called the dehazing cost volume. Images captured in scattering media are degraded due to light scattering and attenuation caused by suspended particles. This degradation depends on scene depth; thus, it is difficult for traditional MVS methods to evaluate photometric consistency because the depth is unknown before three-dimensional (3D) reconstruction. The dehazing cost volume can solve this chicken-and-egg problem of depth estimation and image restoration by computing the scattering effect using swept planes in the cost volume. We also propose a method of estimating scattering parameters, such as airlight, and a scattering coefficient, which are required for our dehazing cost volume. The output depth of a network with our dehazing cost volume can be regarded as a function of these parameters; thus, they are geometrically optimized with a sparse 3D point cloud obtained at a structure-from-motion step. Experimental results on synthesized hazy images indicate the effectiveness of our dehazing cost volume against the ordinary cost volume regarding scattering media. We also demonstrated the applicability of our dehazing cost volume to real foggy scenes.
Abstract:This paper proposes a novel approach for unsupervised domain adaptation (UDA) with target shift. Target shift is a problem of mismatch in label distribution between source and target domains. Typically it appears as class-imbalance in target domain. In practice, this is an important problem in UDA; as we do not know labels in target domain datasets, we do not know whether or not its distribution is identical to that in the source domain dataset. Many traditional approaches achieve UDA with distribution matching by minimizing mean maximum discrepancy or adversarial training; however these approaches implicitly assume a coincidence in the distributions and do not work under situations with target shift. Some recent UDA approaches focus on class boundary and some of them are robust to target shift, but they are only applicable to classification and not to regression. To overcome the target shift problem in UDA, the proposed method, partially shared variational autoencoders (PS-VAEs), uses pair-wise feature alignment instead of feature distribution matching. PS-VAEs inter-convert domain of each sample by a CycleGAN-based architecture while preserving its label-related content. To evaluate the performance of PS-VAEs, we carried out two experiments: UDA with class-unbalanced digits datasets (classification), and UDA from synthesized data to real observation in human-pose-estimation (regression). The proposed method presented its robustness against the class-imbalance in the classification task, and outperformed the other methods in the regression task with a large margin.
Abstract:Three-dimensional (3D) reconstruction and scene depth estimation from 2-dimensional (2D) images are major tasks in computer vision. However, using conventional 3D reconstruction techniques gets challenging in participating media such as murky water, fog, or smoke. We have developed a method that uses a time-of-flight (ToF) camera to estimate an object region and depth in participating media simultaneously. The scattering component is saturated, so it does not depend on the scene depth, and received signals bouncing off distant points are negligible due to light attenuation in the participating media, so the observation of such a point contains only a scattering component. These phenomena enable us to estimate the scattering component in an object region from a background that only contains the scattering component. The problem is formulated as robust estimation where the object region is regarded as outliers, and it enables the simultaneous estimation of an object region and depth on the basis of an iteratively reweighted least squares (IRLS) optimization scheme. We demonstrate the effectiveness of the proposed method using captured images from a Kinect v2 in real foggy scenes and evaluate the applicability with synthesized data.
Abstract:Images captured in participating media such as murky water, fog, or smoke are degraded by scattered light. Thus, the use of traditional three-dimensional (3D) reconstruction techniques in such environments is difficult. In this paper, we propose a photometric stereo method for participating media. The proposed method differs from previous studies with respect to modeling shape-dependent forward scatter. In the proposed model, forward scatter is described as an analytical form using lookup tables and is represented by spatially-variant kernels. We also propose an approximation of a large-scale dense matrix as a sparse matrix, which enables the removal of forward scatter. Experiments with real and synthesized data demonstrate that the proposed method improves 3D reconstruction in participating media.
Abstract:Outlier detection and cluster number estimation is an important issue for clustering real data. This paper focuses on spectral clustering, a time-tested clustering method, and reveals its important properties related to outliers. The highlights of this paper are the following two mathematical observations: first, spectral clustering's intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number. Based on these observations, we designed a function that evaluates clustering and outlier detection results. In experiments, we prepared two scenarios, face clustering in photo album and person re-identification in a camera network. We confirmed that the proposed method detects outliers and estimates the number of clusters properly in both problems. Our method outperforms state-of-the-art methods in both the 128-dimensional sparse space for face clustering and the 4,096-dimensional non-sparse space for person re-identification.