Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gora Chand Nandi

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Jun 18, 2025

Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Micheal Beetz, Gora Chand Nandi

Abstract:Task-Oriented Grasping (TOG) presents a significant challenge, requiring a nuanced understanding of task semantics, object affordances, and the functional constraints dictating how an object should be grasped for a specific task. To address these challenges, we introduce GRIM (Grasp Re-alignment via Iterative Matching), a novel training-free framework for task-oriented grasping. Initially, a coarse alignment strategy is developed using a combination of geometric cues and principal component analysis (PCA)-reduced DINO features for similarity scoring. Subsequently, the full grasp pose associated with the retrieved memory instance is transferred to the aligned scene object and further refined against a set of task-agnostic, geometrically stable grasps generated for the scene object, prioritizing task compatibility. In contrast to existing learning-based methods, GRIM demonstrates strong generalization capabilities, achieving robust performance with only a small number of conditioning examples.

Via

Access Paper or Ask Questions

SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Nov 21, 2024

Arjun P S, Andrew Melnik, Gora Chand Nandi

Figure 1 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 2 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 3 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 4 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Abstract:Experience Goal Visual Rearrangement task stands as a foundational challenge within Embodied AI, requiring an agent to construct a robust world model that accurately captures the goal state. The agent uses this world model to restore a shuffled scene to its original configuration, making an accurate representation of the world essential for successfully completing the task. In this work, we present a novel framework that leverages on 3D Gaussian Splatting as a 3D scene representation for experience goal visual rearrangement task. Recent advances in volumetric scene representation like 3D Gaussian Splatting, offer fast rendering of high quality and photo-realistic novel views. Our approach enables the agent to have consistent views of the current and the goal setting of the rearrangement task, which enables the agent to directly compare the goal state and the shuffled state of the world in image space. To compare these views, we propose to use a dense feature matching method with visual features extracted from a foundation model, leveraging its advantages of a more universal feature representation, which facilitates robustness, and generalization. We validate our approach on the AI2-THOR rearrangement challenge benchmark and demonstrate improvements over the current state of the art methods

Via

Access Paper or Ask Questions

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Jul 09, 2024

Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi(+35 more)

Figure 1 for Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Figure 2 for Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Figure 3 for Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Figure 4 for Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Abstract:In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.

Via

Access Paper or Ask Questions

Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Mar 30, 2024

Arjun P S, Andrew Melnik, Gora Chand Nandi

Figure 1 for Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Figure 2 for Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Figure 3 for Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Figure 4 for Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Abstract:Recent advancements in Generative Artificial Intelligence, particularly in the realm of Large Language Models (LLMs) and Large Vision Language Models (LVLMs), have enabled the prospect of leveraging cognitive planners within robotic systems. This work focuses on solving the object goal navigation problem by mimicking human cognition to attend, perceive and store task specific information and generate plans with the same. We introduce a comprehensive framework capable of exploring an unfamiliar environment in search of an object by leveraging the capabilities of Large Language Models(LLMs) and Large Vision Language Models (LVLMs) in understanding the underlying semantics of our world. A challenging task in using LLMs to generate high level sub-goals is to efficiently represent the environment around the robot. We propose to use a 3D scene modular representation, with semantically rich descriptions of the object, to provide the LLM with task relevant information. But providing the LLM with a mass of contextual information (rich 3D scene semantic representation), can lead to redundant and inefficient plans. We propose to use an LLM based pruner that leverages the capabilities of in-context learning to prune out irrelevant goal specific information.

Via

Access Paper or Ask Questions

UniTeam: Open Vocabulary Mobile Manipulation Challenge

Dec 14, 2023

Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke

Figure 1 for UniTeam: Open Vocabulary Mobile Manipulation Challenge

Figure 2 for UniTeam: Open Vocabulary Mobile Manipulation Challenge

Figure 3 for UniTeam: Open Vocabulary Mobile Manipulation Challenge

Figure 4 for UniTeam: Open Vocabulary Mobile Manipulation Challenge

Abstract:This report introduces our UniTeam agent - an improved baseline for the "HomeRobot: Open Vocabulary Mobile Manipulation" challenge. The challenge poses problems of navigation in unfamiliar environments, manipulation of novel objects, and recognition of open-vocabulary object classes. This challenge aims to facilitate cross-cutting research in embodied AI using recent advances in machine learning, computer vision, natural language, and robotics. In this work, we conducted an exhaustive evaluation of the provided baseline agent; identified deficiencies in perception, navigation, and manipulation skills; and improved the baseline agent's performance. Notably, enhancements were made in perception - minimizing misclassifications; navigation - preventing infinite loop commitments; picking - addressing failures due to changing object visibility; and placing - ensuring accurate positioning for successful object placement.

Via

Access Paper or Ask Questions

Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Dec 10, 2023

Jannik Sheikh, Andrew Melnik, Gora Chand Nandi, Robert Haschke

Figure 1 for Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Figure 2 for Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Figure 3 for Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Figure 4 for Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Abstract:Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.

Via

Access Paper or Ask Questions

Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction

Jan 12, 2020

Shruti Jaiswal, Gora Chand Nandi

Figure 1 for Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction

Figure 2 for Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction

Figure 3 for Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction

Figure 4 for Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction

Abstract:To enable humanoid robots to share our social space we need to develop technology for easy interaction with the robots using multiple modes such as speech, gestures and share our emotions with them. We have targeted this research towards addressing the core issue of emotion recognition problem which would require less computation resources and much lesser number of network hyperparameters which will be more adaptive to be computed on low resourced social robots for real time communication. More specifically, here we have proposed an Inception module based Convolutional Neural Network Architecture which has achieved improved accuracy of upto 6% improvement over the existing network architecture for emotion classification when combinedly tested over multiple datasets when tried over humanoid robots in real - time. Our proposed model is reducing the trainable Hyperparameters to an extent of 94% as compared to vanilla CNN model which clearly indicates that it can be used in real time based application such as human robot interaction. Rigorous experiments have been performed to validate our methodology which is sufficiently robust and could achieve high level of accuracy. Finally, the model is implemented in a humanoid robot, NAO in real time and robustness of the model is evaluated.

* 14 pages

Via

Access Paper or Ask Questions