Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Rich

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Dec 01, 2021

Noah Stier, Alexander Rich, Pradeep Sen, Tobias Höllerer

Figure 1 for VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Figure 2 for VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Figure 3 for VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Figure 4 for VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Abstract:Recent volumetric 3D reconstruction methods can produce very accurate results, with plausible geometry even for unobserved surfaces. However, they face an undesirable trade-off when it comes to multi-view fusion. They can fuse all available view information by global averaging, thus losing fine detail, or they can heuristically cluster views for local fusion, thus restricting their ability to consider all views jointly. Our key insight is that greater detail can be retained without restricting view diversity by learning a view-fusion function conditioned on camera pose and image content. We propose to learn this multi-view fusion using a transformer. To this end, we introduce VoRTX, an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion. Our model is occlusion-aware, leveraging the transformer architecture to predict an initial, projective scene geometry estimate. This estimate is used to avoid backprojecting image features through surfaces into occluded regions. We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods. We also demonstrate generalization without any fine-tuning, outperforming the same state-of-the-art methods on two other datasets, TUM-RGBD and ICL-NUIM.

* 3DV 2021

Via

Access Paper or Ask Questions

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Dec 01, 2021

Alexander Rich, Noah Stier, Pradeep Sen, Tobias Höllerer

Figure 1 for 3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Figure 2 for 3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Figure 3 for 3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Figure 4 for 3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Abstract:We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

* 10 pages, 6 figures, 3 tables. Accepted to 3DV 2021

Via

Access Paper or Ask Questions