Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Oct 15, 2024

Shang-Ching Liu, Van Nhiem Tran, Wenkai Chen, Wei-Lun Cheng, Yen-Lin Huang, I-Bin Liao, Yung-Hui Li, Jianwei Zhang

Figure 1 for PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Figure 2 for PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Figure 3 for PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Figure 4 for PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Share this with someone who'll enjoy it:

Abstract:Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: pavlm-source.github.io.

View paper on

Share this with someone who'll enjoy it:

Title:PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Paper and Code