Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhinav Pathak

A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Apr 09, 2025

Abhinav Pathak, Kalaichelvi Venkatesan, Tarek Taha, Rajkumar Muthusamy

Abstract:The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.

Via

Access Paper or Ask Questions

Collapse and Collision Aware Grasping for Cluttered Shelf Picking

Mar 28, 2025

Abhinav Pathak, Rajkumar Muthusamy

Abstract:Efficient and safe retrieval of stacked objects in warehouse environments is a significant challenge due to complex spatial dependencies and structural inter-dependencies. Traditional vision-based methods excel at object localization but often lack the physical reasoning required to predict the consequences of extraction, leading to unintended collisions and collapses. This paper proposes a collapse and collision aware grasp planner that integrates dynamic physics simulations for robotic decision-making. Using a single image and depth map, an approximate 3D representation of the scene is reconstructed in a simulation environment, enabling the robot to evaluate different retrieval strategies before execution. Two approaches 1) heuristic-based and 2) physics-based are proposed for both single-box extraction and shelf clearance tasks. Extensive real-world experiments on structured and unstructured box stacks, along with validation using datasets from existing databases, show that our physics-aware method significantly improves efficiency and success rates compared to baseline heuristics.

Via

Access Paper or Ask Questions