Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Apr 09, 2025

Abhinav Pathak, Kalaichelvi Venkatesan, Tarek Taha, Rajkumar Muthusamy

Figure 1 for A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Figure 2 for A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Figure 3 for A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Figure 4 for A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Share this with someone who'll enjoy it:

Abstract:The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.

View paper on

Share this with someone who'll enjoy it:

Title:A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Paper and Code