Abstract:Recent advances in monocular depth prediction have led to significantly improved depth prediction accuracy. In turn, this enables various applications to use such depth predictions. In this paper, we propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale and shift parameter, our solvers jointly estimate both scale and shift parameters together with the camera pose. We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths. Experiments on synthetic and real data, including experiments with depth maps estimated by 11 different depth predictors, show the practical viability of our solvers. Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets. The source code is available at https://github.com/yaqding/pose_monodepth
Abstract:We consider the problem of two-view matching under significant viewpoint changes with view synthesis. We propose two novel methods, minimizing the view synthesis overhead. The first one, named DenseAffNet, uses dense affine shapes estimates from AffNet, which allows it to partition the image, rectifying each partition with just a single affine map. The second one, named DepthAffNet, combines information from depth maps and affine shapes estimates to produce different sets of rectifying affine maps for different image partitions. DenseAffNet is faster than the state-of-the-art and more accurate on generic scenes. DepthAffNet is on par with the state of the art on scenes containing large planes. The evaluation is performed on 3 public datasets - EVD Dataset, Strong ViewPoint Changes Dataset and IMC Phototourism Dataset.