Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daehyung Park

A Survey on Integration of Large Language Models with Intelligent Robots

Apr 14, 2024

Yeseung Kim, Dohyun Kim, Jieun Choi, Jisang Park, Nayoung Oh, Daehyung Park

Abstract:In recent years, the integration of large language models (LLMs) has revolutionized the field of robotics, enabling robots to communicate, understand, and reason with human-like proficiency. This paper explores the multifaceted impact of LLMs on robotics, addressing key challenges and opportunities for leveraging these models across various domains. By categorizing and analyzing LLM applications within core robotics elements -- communication, perception, planning, and control -- we aim to provide actionable insights for researchers seeking to integrate LLMs into their robotic systems. Our investigation focuses on LLMs developed post-GPT-3.5, primarily in text-based modalities while also considering multimodal approaches for perception and control. We offer comprehensive guidelines and examples for prompt engineering, facilitating beginners' access to LLM-based robotics solutions. Through tutorial-level examples and structured prompt construction, we illustrate how LLM-guided enhancements can be seamlessly integrated into robotics applications. This survey serves as a roadmap for researchers navigating the evolving landscape of LLM-driven robotics, offering a comprehensive overview and practical guidance for harnessing the power of language models in robotics development.

* 24 pages, 1 figure, Submitted to Intelligent Service Robotics (ISR)

Via

Access Paper or Ask Questions

LINGO-Space: Language-Conditioned Incremental Grounding for Space

Feb 02, 2024

Dohyun Kim, Nayoung Oh, Deokmin Hwang, Daehyung Park

Figure 1 for LINGO-Space: Language-Conditioned Incremental Grounding for Space

Figure 2 for LINGO-Space: Language-Conditioned Incremental Grounding for Space

Figure 3 for LINGO-Space: Language-Conditioned Incremental Grounding for Space

Figure 4 for LINGO-Space: Language-Conditioned Incremental Grounding for Space

Abstract:We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (LINGO-Space) that accurately identifies a probabilistic distribution of space being referred to and incrementally updates it, given subsequent referring expressions leveraging configurable polar distributions. Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through $20$ table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks. Code and videos are available at https://lingo-space.github.io.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Graph-based 3D Collision-distance Estimation Network with Probabilistic Graph Rewiring

Oct 06, 2023

Minjae Song, Yeseung Kim, Daehyung Park

Abstract:We aim to solve the problem of data-driven collision-distance estimation given 3-dimensional (3D) geometries. Conventional algorithms suffer from low accuracy due to their reliance on limited representations, such as point clouds. In contrast, our previous graph-based model, GraphDistNet, achieves high accuracy using edge information but incurs higher message-passing costs with growing graph size, limiting its applicability to 3D geometries. To overcome these challenges, we propose GDN-R, a novel 3D graph-based estimation network.GDN-R employs a layer-wise probabilistic graph-rewiring algorithm leveraging the differentiable Gumbel-top-K relaxation. Our method accurately infers minimum distances through iterative graph rewiring and updating relevant embeddings. The probabilistic rewiring enables fast and robust embedding with respect to unforeseen categories of geometries. Through 41,412 random benchmark tasks with 150 pairs of 3D objects, we show GDN-R outperforms state-of-the-art baseline methods in terms of accuracy and generalizability. We also show that the proposed rewiring improves the update performance reducing the size of the estimation model. We finally show its batch prediction and auto-differentiation capabilities for trajectory optimization in both simulated and real-world scenarios.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation

Jul 14, 2023

Dohyun Kim, Yeseung Kim, Jaehwi Jang, Minjae Song, Woojin Choi, Daehyung Park

Abstract:The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately grounding language utterances gives a significant challenge due to the acoustic variability in speakers' voices and environmental noise. In this work, we propose a novel speech-scene graph grounding network (SGGNet$^2$) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems. To incorporate the acoustic similarity, we extend our previous grounding model, the scene-graph-based grounding network (SGGNet), with the ASR model from NVIDIA NeMo. We accomplish this by feeding the latent vector of speech pronunciations into the BERT-based grounding network within SGGNet. We evaluate the effectiveness of using latent vectors of speech commands in grounding through qualitative and quantitative studies. We also demonstrate the capability of SGGNet$^2$ in a speech-based navigation task using a real quadruped robot, RBQ-3, from Rainbow Robotics.

* 7 pages, 6 figures, Paper accepted for the Special Session at the 2023 International Symposium on Robot and Human Interactive Communication (RO-MAN), [Dohyun Kim, Yeseung Kim, Jaehwi Jang, and Minjae Song] contributed equally to this work

Via

Access Paper or Ask Questions

Inverse Constraint Learning and Generalization by Transferable Reward Decomposition

Jun 21, 2023

Jaehwi Jang, Minjae Song, Daehyung Park

Abstract:We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and four baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a $72\%$ higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on a real-world robotic tray-carrying task.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

A Reachability Tree-Based Algorithm for Robot Task and Motion Planning

Mar 07, 2023

Kanghyun Kim, Daehyung Park, Min Jun Kim

Abstract:This paper presents a novel algorithm for robot task and motion planning (TAMP) problems by utilizing a reachability tree. While tree-based algorithms are known for their speed and simplicity in motion planning (MP), they are not well-suited for TAMP problems that involve both abstracted and geometrical state variables. To address this challenge, we propose a hierarchical sampling strategy, which first generates an abstracted task plan using Monte Carlo tree search (MCTS) and then fills in the details with a geometrically feasible motion trajectory. Moreover, we show that the performance of the proposed method can be significantly enhanced by selecting an appropriate reward for MCTS and by using a pre-generated goal state that is guaranteed to be geometrically feasible. A comparative study using TAMP benchmark problems demonstrates the effectiveness of the proposed approach.

* IEEE International Conference on Robotics and Automation (ICRA) 2023

Via

Access Paper or Ask Questions

GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Jun 03, 2022

Yeseung Kim, Jinwoo Kim, Daehyung Park

Figure 1 for GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Figure 2 for GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Figure 3 for GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Figure 4 for GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Abstract:Trajectory optimization (TO) aims to find a sequence of valid states while minimizing costs. However, its fine validation process is often costly due to computationally expensive collision searches, otherwise coarse searches lower the safety of the system losing a precise solution. To resolve the issues, we introduce a new collision-distance estimator, GraphDistNet, that can precisely encode the structural information between two geometries by leveraging edge feature-based convolutional operations, and also efficiently predict a batch of collision distances and gradients through 25,000 random environments with a maximum of 20 unforeseen objects. Further, we show the adoption of attention mechanism enables our method to be easily generalized in unforeseen complex geometries toward TO. Our evaluation show GraphDistNet outperforms state-of-the-art baseline methods in both simulated and real world tasks.

* 8 pages, 7 figures, submitted to RA-L with IROS 2022 Option

Via

Access Paper or Ask Questions

Reactive Task and Motion Planning under Temporal Logic Specifications

Mar 26, 2021

Shen Li, Daehyung Park, Yoonchang Sung, Julie A. Shah, Nicholas Roy

Figure 1 for Reactive Task and Motion Planning under Temporal Logic Specifications

Figure 2 for Reactive Task and Motion Planning under Temporal Logic Specifications

Figure 3 for Reactive Task and Motion Planning under Temporal Logic Specifications

Figure 4 for Reactive Task and Motion Planning under Temporal Logic Specifications

Abstract:We present a task-and-motion planning (TAMP) algorithm robust against a human operator's cooperative or adversarial interventions. Interventions often invalidate the current plan and require replanning on the fly. Replanning can be computationally expensive and often interrupts seamless task execution. We introduce a dynamically reconfigurable planning methodology with behavior tree-based control strategies toward reactive TAMP, which takes the advantage of previous plans and incremental graph search during temporal logic-based reactive synthesis. Our algorithm also shows efficient recovery functionalities that minimize the number of replanning steps. Finally, our algorithm produces a robust, efficient, and complete TAMP solution. Our experimental results show the algorithm results in superior manipulation performance in both simulated and real-world tasks.

* 7 pages, 6 figures, Published in IEEE International Conference on Robotics and Automation (ICRA), 2021

Via

Access Paper or Ask Questions

Toward Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned

Apr 07, 2019

Daehyung Park, Yuuna Hoshi, Harshal P. Mahajan, Wendy A. Rogers, Charles C. Kemp

Figure 1 for Toward Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned

Figure 2 for Toward Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned

Figure 3 for Toward Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned

Figure 4 for Toward Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned

Abstract:Eating is an essential activity of daily living (ADL) for staying healthy and living at home independently. Although numerous assistive devices have been introduced, many people with disabilities are still restricted from independent eating due to the devices' physical or perceptual limitations. In this work, we introduce a new meal-assistance system using a general-purpose mobile manipulator, a Willow Garage PR2, which has the potential to serve as a versatile form of assistive technology. Our active feeding framework enables the robot to autonomously deliver food to the user's mouth. In detail, our web-based user interface, visually-guided behaviors, and safety tools allow people with severe motor impairments to benefit from the robotic assistance. We evaluated our system with 10 able-bodied participants and 9 people with motor impairments. Both groups of participants successfully ate various foods using the system and reported high rates of success for the system's autonomous behaviors in a laboratory environment. Then, we performed in-home evaluation with Henry Evans, a person with quadriplegia, at his house in California, USA. In general, Henry and the other people who operated the system reported that it was comfortable, safe, and easy-to-use. We discuss learned lessons and design insights through user evaluations.

Via

Access Paper or Ask Questions

3D Human Pose Estimation on a Configurable Bed from a Pressure Image

Aug 29, 2018

Henry M. Clever, Ariel Kapusta, Daehyung Park, Zackory Erickson, Yash Chitalia, Charles C. Kemp

Figure 1 for 3D Human Pose Estimation on a Configurable Bed from a Pressure Image

Figure 2 for 3D Human Pose Estimation on a Configurable Bed from a Pressure Image

Figure 3 for 3D Human Pose Estimation on a Configurable Bed from a Pressure Image

Figure 4 for 3D Human Pose Estimation on a Configurable Bed from a Pressure Image

Abstract:Robots have the potential to assist people in bed, such as in healthcare settings, yet bedding materials like sheets and blankets can make observation of the human body difficult for robots. A pressure-sensing mat on a bed can provide pressure images that are relatively insensitive to bedding materials. However, prior work on estimating human pose from pressure images has been restricted to 2D pose estimates and flat beds. In this work, we present two convolutional neural networks to estimate the 3D joint positions of a person in a configurable bed from a single pressure image. The first network directly outputs 3D joint positions, while the second outputs a kinematic model that includes estimated joint angles and limb lengths. We evaluated our networks on data from 17 human participants with two bed configurations: supine and seated. Our networks achieved a mean joint position error of 77 mm when tested with data from people outside the training set, outperforming several baselines. We also present a simple mechanical model that provides insight into ambiguity associated with limbs raised off of the pressure mat, and demonstrate that Monte Carlo dropout can be used to estimate pose confidence in these situations. Finally, we provide a demonstration in which a mobile manipulator uses our network's estimated kinematic model to reach a location on a person's body in spite of the person being seated in a bed and covered by a blanket.

* 8 pages, 10 figures

Via

Access Paper or Ask Questions