https://github.com/cogsys-tuebingen/tristereonet.
Stereo vision is an effective technique for depth estimation with broad applicability in autonomous urban and highway driving. While various deep learning-based approaches have been developed for stereo, the input data from a binocular setup with a fixed baseline are limited. Addressing such a problem, we present an end-to-end network for processing the data from a trinocular setup, which is a combination of a narrow and a wide stereo pair. In this design, two pairs of binocular data with a common reference image are treated with shared weights of the network and a mid-level fusion. We also propose a Guided Addition method for merging the 4D data of the two baselines. Additionally, an iterative sequential self-supervised and supervised learning on real and synthetic datasets is presented, making the training of the trinocular system practical with no need to ground-truth data of the real dataset. Experimental results demonstrate that the trinocular disparity network surpasses the scenario where individual pairs are fed into a similar architecture. Code and dataset: