Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohan Sridharan

AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement

Feb 04, 2025

Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

Abstract:Embodied agents assisting humans are often asked to complete a new task in a new scenario. An agent preparing a particular dish in the kitchen based on a known recipe may be asked to prepare a new dish or to perform cleaning tasks in the storeroom. There may not be sufficient resources, e.g., time or labeled examples, to train the agent for these new situations. Large Language Models (LLMs) trained on considerable knowledge across many domains are able to predict a sequence of abstract actions for such new tasks and scenarios, although it may not be possible for the agent to execute this action sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain-specific knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks and scenarios. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation over cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM output.

* Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025

Via

Access Paper or Ask Questions

Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Feb 04, 2025

Raghav Arora, Shivam Singh, Karthik Swaminathan, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna

Figure 1 for Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Figure 2 for Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Figure 3 for Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Figure 4 for Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Abstract:Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLMs), but they do so at the level of high-level tasks and/or require many training examples. Our framework leverages the generic knowledge of LLMs through a small number of prompts to perform high-level task anticipation, using the anticipated tasks as goals in a classical planning system to compute a sequence of finer-granularity actions that jointly achieve these goals. We ground and evaluate our framework's abilities in realistic scenarios in the VirtualHome environment and demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.

* Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2024

Via

Access Paper or Ask Questions

Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Apr 04, 2024

Shivam Singh, Karthik Swaminathan, Raghav Arora, Ramandeep Singh, Ahana Datta, Dipanjan Das, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

Figure 1 for Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Figure 2 for Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Figure 3 for Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Figure 4 for Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Abstract:An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals for a classical planning system that computed a sequence of low-level actions for the agent to achieve these goals. This paper describes DaTAPlan, our framework that significantly extends our prior work toward human-robot collaboration. Specifically, DaTAPlan planner computes actions for an agent and a human to collaboratively and jointly achieve the tasks anticipated by the LLM, and the agent automatically adapts to unexpected changes in human action outcomes and preferences. We evaluate DaTAPlan capabilities in a realistic simulation environment, demonstrating accurate task anticipation, effective human-robot collaboration, and the ability to adapt to unexpected changes. Project website: https://dataplan-hrc.github.io

Via

Access Paper or Ask Questions

Relevance Score: A Landmark-Like Heuristic for Planning

Mar 12, 2024

Oliver Kim, Mohan Sridharan

Abstract:Landmarks are facts or actions that appear in all valid solutions of a planning problem. They have been used successfully to calculate heuristics that guide the search for a plan. We investigate an extension to this concept by defining a novel "relevance score" that helps identify facts or actions that appear in most but not all plans to achieve any given goal. We describe an approach to compute this relevance score and use it as a heuristic in the search for a plan. We experimentally compare the performance of our approach with that of a state of the art landmark-based heuristic planning approach using benchmark planning problems. While the original landmark-based heuristic leads to better performance on problems with well-defined landmarks, our approach substantially improves performance on problems that lack non-trivial landmarks.

* 12 Pages, 3 figures

Via

Access Paper or Ask Questions

CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Jun 02, 2023

Ayush Agrawal, Raghav Arora, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna

Figure 1 for CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Figure 2 for CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Figure 3 for CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Figure 4 for CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Abstract:This paper introduces a novel method for determining the best room to place an object in, for embodied scene rearrangement. While state-of-the-art approaches rely on large language models (LLMs) or reinforcement learned (RL) policies for this task, our approach, CLIPGraphs, efficiently combines commonsense domain knowledge, data-driven methods, and recent advances in multimodal learning. Specifically, it (a)encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines

* RO-MAN 2023 Conference

Via

Access Paper or Ask Questions

Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork

Jun 01, 2023

Hasra Dodampegama, Mohan Sridharan

Abstract:Ad hoc teamwork refers to the problem of enabling an agent to collaborate with teammates without prior coordination. Data-driven methods represent the state of the art in ad hoc teamwork. They use a large labeled dataset of prior observations to model the behavior of other agent types and to determine the ad hoc agent's behavior. These methods are computationally expensive, lack transparency, and make it difficult to adapt to previously unseen changes, e.g., in team composition. Our recent work introduced an architecture that determined an ad hoc agent's behavior based on non-monotonic logical reasoning with prior commonsense domain knowledge and predictive models of other agents' behavior that were learned from limited examples. In this paper, we substantially expand the architecture's capabilities to support: (a) online selection, adaptation, and learning of the models that predict the other agents' behavior; and (b) collaboration with teammates in the presence of partial observability and limited communication. We illustrate and experimentally evaluate the capabilities of our architecture in two simulated multiagent benchmark domains for ad hoc teamwork: Fort Attack and Half Field Offense. We show that the performance of our architecture is comparable or better than state of the art data-driven baselines in both simple and complex scenarios, particularly in the presence of limited training data, partial observability, and changes in team composition.

* 17 pages, 3 Figures

Via

Access Paper or Ask Questions

RAMP: A Benchmark for Evaluating Robotic Assembly Manipulation and Planning

May 16, 2023

Jack Collins, Mark Robson, Jun Yamada, Mohan Sridharan, Karol Janik, Ingmar Posner

Abstract:We introduce RAMP, an open-source robotics benchmark inspired by real-world industrial assembly tasks. RAMP consists of beams that a robot must assemble into specified goal configurations using pegs as fasteners. As such it assesses planning and execution capabilities, and poses challenges in perception, reasoning, manipulation, diagnostics, fault recovery and goal parsing. RAMP has been designed to be accessible and extensible. Parts are either 3D printed or otherwise constructed from materials that are readily obtainable. The part design and detailed instructions are publicly available. In order to broaden community engagement, RAMP incorporates fixtures such as April Tags which enable researchers to focus on individual sub-tasks of the assembly challenge if desired. We provide a full digital twin as well as rudimentary baselines to enable rapid progress. Our vision is for RAMP to form the substrate for a community-driven endeavour that evolves as capability matures.

* Project website: https://sites.google.com/oxfordrobotics.institute/ramp

Via

Access Paper or Ask Questions

Sequence-Agnostic Multi-Object Navigation

May 10, 2023

Nandiraju Gireesh, Ayush Agrawal, Ahana Datta, Snehasis Banerjee, Mohan Sridharan, Brojeshwar Bhowmick, Madhava Krishna

Abstract:The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.

* ICRA 2023 conference

Via

Access Paper or Ask Questions

Toward a Framework for Adaptive ImpedancenControl of an Upper-limb Prosthesis

Sep 11, 2022

Laura Ferrante, Mohan Sridharan, Claudio Zito, Dario Farina

Figure 1 for Toward a Framework for Adaptive ImpedancenControl of an Upper-limb Prosthesis

Figure 2 for Toward a Framework for Adaptive ImpedancenControl of an Upper-limb Prosthesis

Figure 3 for Toward a Framework for Adaptive ImpedancenControl of an Upper-limb Prosthesis

Figure 4 for Toward a Framework for Adaptive ImpedancenControl of an Upper-limb Prosthesis

Abstract:This paper describes a novel framework for a human-machine interface that can be used to control an upper-limb prosthesis. The objective is to estimate the human's motor intent from noisy surface electromyography signals and to execute the motor intent on the prosthesis (i.e., the robot) even in the presence of previously unseen perturbations. The framework includes muscle-tendon models for each degree of freedom, a method for learning the parameter values of models used to estimate the user's motor intent, and a variable impedance controller that uses the stiffness and damping values obtained from the muscle models to adapt the prosthesis' motion trajectory and dynamics. We experimentally evaluate our framework in the context of able-bodied humans using a simulated version of the human-machine interface to perform reaching tasks that primarily actuate one degree of freedom in the wrist, and consider external perturbations in the form of a uniform force field that pushes the wrist away from the target. We demonstrate that our framework provides the desired adaptive performance, and substantially improves performance in comparison with a data-driven baseline.

Via

Access Paper or Ask Questions

Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

Aug 27, 2022

D. A. Sasi Kiran, Kritika Anand, Chaitanya Kharyal, Gulshan Kumar, Nandiraju Gireesh, Snehasis Banerjee, Ruddra dev Roychoudhury, Mohan Sridharan, Brojeshwar Bhowmick, Madhava Krishna

Figure 1 for Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

Figure 2 for Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

Figure 3 for Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

Figure 4 for Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

Abstract:This paper describes a framework for the object-goal navigation task, which requires a robot to find and move to the closest instance of a target object class from a random starting position. The framework uses a history of robot trajectories to learn a Spatial Relational Graph (SRG) and Graph Convolutional Network (GCN)-based embeddings for the likelihood of proximity of different semantically-labeled regions and the occurrence of different object classes in these regions. To locate a target object instance during evaluation, the robot uses Bayesian inference and the SRG to estimate the visible regions, and uses the learned GCN embeddings to rank visible regions and select the region to explore next.

* CASE 2022 paper

Via

Access Paper or Ask Questions