Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Duan

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Jul 30, 2022

Hao Wen, Yunze Liu, Jingwei Huang, Bo Duan, Li Yi

Figure 1 for Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Figure 2 for Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Figure 3 for Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Figure 4 for Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Abstract:This paper proposes a 4D backbone for long-term point cloud video understanding. A typical way to capture spatial-temporal context is using 4Dconv or transformer without hierarchy. However, those methods are neither effective nor efficient enough due to camera motion, scene changes, sampling patterns, and the complexity of 4D data. To address those issues, we leverage the primitive plane as a mid-level representation to capture the long-term spatial-temporal context in 4D point cloud videos and propose a novel hierarchical backbone named Point Primitive Transformer(PPTr), which is mainly composed of intra-primitive point transformers and primitive transformers. Extensive experiments show that PPTr outperforms the previous state of the arts on different tasks

* ECCV2022

Via

Access Paper or Ask Questions

MVLayoutNet:3D layout reconstruction with multi-view panoramas

Dec 12, 2021

Zhihua Hu, Bo Duan, Yanfeng Zhang, Mingwei Sun, Jingwei Huang

Figure 1 for MVLayoutNet:3D layout reconstruction with multi-view panoramas

Figure 2 for MVLayoutNet:3D layout reconstruction with multi-view panoramas

Figure 3 for MVLayoutNet:3D layout reconstruction with multi-view panoramas

Figure 4 for MVLayoutNet:3D layout reconstruction with multi-view panoramas

Abstract:We present MVLayoutNet, an end-to-end network for holistic 3D reconstruction from multi-view panoramas. Our core contribution is to seamlessly combine learned monocular layout estimation and multi-view stereo (MVS) for accurate layout reconstruction in both 3D and image space. We jointly train a layout module to produce an initial layout and a novel MVS module to obtain accurate layout geometry. Unlike standard MVSNet [33], our MVS module takes a newly-proposed layout cost volume, which aggregates multi-view costs at the same depth layer into corresponding layout elements. We additionally provide an attention-based scheme that guides the MVS module to focus on structural regions. Such a design considers both local pixel-level costs and global holistic information for better reconstruction. Experiments show that our method outperforms state-of-the-arts in terms of depth rmse by 21.7% and 20.6% on the 2D-3D-S [1] and ZInD [5] datasets. Finally, our method leads to coherent layout geometry that enables the reconstruction of an entire scene.

Via

Access Paper or Ask Questions