Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

May 28, 2024

Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

Figure 1 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Figure 2 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Figure 3 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Figure 4 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Share this with someone who'll enjoy it:

Abstract:In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us to introduce 3D intention grounding, a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back". Closely related, 3D visual grounding focuses on understanding human reference. To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch". Instead, 3D intention grounding challenges AI agents to automatically observe, reason and detect the desired target solely based on human intention. To tackle this challenge, we introduce the new Intent3D dataset, consisting of 44,990 intention texts associated with 209 fine-grained classes from 1,042 scenes of the ScanNet dataset. We also establish several baselines based on different language-based 3D object detection models on our benchmark. Finally, we propose IntentNet, our unique approach, designed to tackle this intention-based detection problem. It focuses on three key aspects: intention understanding, reasoning to identify object candidates, and cascaded adaptive learning that leverages the intrinsic priority logic of different losses for multiple objective optimization.

View paper on

Share this with someone who'll enjoy it:

Title:Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Paper and Code