We propose an embarrassingly simple, but very effective scheme for high-quality dense stereo reconstruction: (i) generate an approximate reconstruction with your favourite stereo matcher; (ii) rewarp the input images with that approximate model; and (iii) with the initial reconstruction and the warped images as input, train a deep network to enhance the reconstruction by regressing a residual correction. The strategy to only learn the residual greatly simplifies the learning problem. A standard Unet without bells and whistles is enough to reconstruct even small surface details, like dormers and roof substructures in satellite images. We also investigate residual reconstruction with less information and find that even a single image is enough to greatly improve an approximate reconstruction. Our full model reduces the mean absolute error of state-of-the-art stereo reconstruction systems by >50%, both in our target domain of satellite stereo and on stereo pairs from the ETH3D benchmark.

Title:ResDepth: Learned Residual Stereo Reconstruction

Paper and Code