Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Dec 14, 2023

Sicheng Wang, Hao Jiang, Lei Xiang

Figure 1 for CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Figure 2 for CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Figure 3 for CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Figure 4 for CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Share this with someone who'll enjoy it:

Abstract:Recent deep multi-view stereo (MVS) methods have widely incorporated transformers into cascade network for high-resolution depth estimation, achieving impressive results. However, existing transformer-based methods are constrained by their computational costs, preventing their extension to finer stages. In this paper, we propose a novel cross-scale transformer (CT) that processes feature representations at different stages without additional computation. Specifically, we introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. This combined strategy enables our network to capture intra-image context information and enhance inter-image feature relationships. Besides, we present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction to further strengthen global and local feature awareness. Meanwhile, we design a feature metric loss (FM Loss) that evaluates the feature bias before and after transformation to reduce the impact of feature mismatch on depth estimation. Extensive experiments on DTU dataset and Tanks and Temples (T\&T) benchmark demonstrate that our method achieves state-of-the-art results. Code is available at https://github.com/wscstrive/CT-MVSNet.

* Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024)

View paper on

Share this with someone who'll enjoy it:

Title:CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Paper and Code