Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haomeng Zhang

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Oct 29, 2024

Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh

Figure 1 for Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Figure 2 for Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Figure 3 for Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Figure 4 for Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Abstract:Multi-object 3D Grounding involves locating 3D boxes based on a given query phrase from a point cloud. It is a challenging and significant task with numerous applications in visual understanding, human-computer interaction, and robotics. To tackle this challenge, we introduce D-LISA, a two-stage approach incorporating three innovations. First, a dynamic vision module that enables a variable and learnable number of box proposals. Second, a dynamic camera positioning that extracts features for each proposal. Third, a language-informed spatial attention module that better reasons over the proposals to output the final prediction. Empirically, experiments show that our method outperforms the state-of-the-art methods on multi-object 3D grounding by 12.8% (absolute) and is competitive in single-object 3D grounding.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Hyperspherical Embedding for Point Cloud Completion

Jul 11, 2023

Junming Zhang, Haomeng Zhang, Ram Vasudevan, Matthew Johnson-Roberson

Abstract:Most real-world 3D measurements from depth sensors are incomplete, and to address this issue the point cloud completion task aims to predict the complete shapes of objects from partial observations. Previous works often adapt an encoder-decoder architecture, where the encoder is trained to extract embeddings that are used as inputs to generate predictions from the decoder. However, the learned embeddings have sparse distribution in the feature space, which leads to worse generalization results during testing. To address these problems, this paper proposes a hyperspherical module, which transforms and normalizes embeddings from the encoder to be on a unit hypersphere. With the proposed module, the magnitude and direction of the output hyperspherical embedding are decoupled and only the directional information is optimized. We theoretically analyze the hyperspherical embedding and show that it enables more stable training with a wider range of learning rates and more compact embedding distributions. Experiment results show consistent improvement of point cloud completion in both single-task and multi-task learning, which demonstrates the effectiveness of the proposed method.

Via

Access Paper or Ask Questions