Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Sep 04, 2024

Soomin Kim, Hyesong Choi, Jihye Ahn, Dongbo Min

Figure 1 for UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Figure 2 for UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Figure 3 for UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Figure 4 for UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Share this with someone who'll enjoy it:

Abstract:Unlike other vision tasks where Transformer-based approaches are becoming increasingly common, stereo depth estimation is still dominated by convolution-based approaches. This is mainly due to the limited availability of real-world ground truth for stereo matching, which is a limiting factor in improving the performance of Transformer-based stereo approaches. In this paper, we propose UniTT-Stereo, a method to maximize the potential of Transformer-based stereo architectures by unifying self-supervised learning used for pre-training with stereo matching framework based on supervised learning. To be specific, we explore the effectiveness of reconstructing features of masked portions in an input image and at the same time predicting corresponding points in another image from the perspective of locality inductive bias, which is crucial in training models with limited training data. Moreover, to address these challenging tasks of reconstruction-and-prediction, we present a new strategy to vary a masking ratio when training the stereo model with stereo-tailored losses. State-of-the-art performance of UniTT-Stereo is validated on various benchmarks such as ETH3D, KITTI 2012, and KITTI 2015 datasets. Lastly, to investigate the advantages of the proposed approach, we provide a frequency analysis of feature maps and the analysis of locality inductive bias based on attention maps.

View paper on

Share this with someone who'll enjoy it:

Title:UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Paper and Code