Abstract:Recent advancements in robot tool use have unlocked their usage for novel tasks, yet the predominant focus is on rigid-body tools, while the investigation of soft-body tools and their dynamic interaction with rigid bodies remains unexplored. This paper takes a pioneering step towards dynamic one-shot soft tool use for manipulating rigid objects, a challenging problem posed by complex interactions and unobservable physical properties. To address these problems, we propose the Implicit Physics-aware (IPA) policy, designed to facilitate effective soft tool use across various environmental configurations. The IPA policy conducts system identification to implicitly identify physics information and predict goal-conditioned, one-shot actions accordingly. We validate our approach through a challenging task, i.e., transporting rigid objects using soft tools such as ropes to distant target positions in a single attempt under unknown environment physics parameters. Our experimental results indicate the effectiveness of our method in efficiently identifying physical properties, accurately predicting actions, and smoothly generalizing to real-world environments. The related video is available at: https://youtu.be/4hPrUDTc4Rg?si=WUZrT2vjLMt8qRWA
Abstract:Neural Signed Distance Fields (SDFs) provide a differentiable environment representation to readily obtain collision checks and well-defined gradients for robot navigation tasks. However, updating neural SDFs as the scene evolves entails re-training, which is tedious, time consuming, and inefficient, making it unsuitable for robot navigation with limited field-of-view in dynamic environments. Towards this objective, we propose a compositional framework of neural SDFs to solve robot navigation in indoor environments using only an onboard RGB-D sensor. Our framework embodies a dual mode procedure for trajectory optimization, with different modes using complementary methods of modeling collision costs and collision avoidance gradients. The primary stage queries the robot body's SDF, swept along the route to goal, at the obstacle point cloud, enabling swift local optimization of trajectories. The secondary stage infers the visible scene's SDF by aligning and composing the SDF representations of its constituents, providing better informed costs and gradients for trajectory optimization. The dual mode procedure combines the best of both stages, achieving a success rate of 98%, 14.4% higher than baseline with comparable amortized plan time on iGibson 2.0. We also demonstrate its effectiveness in adapting to real-world indoor scenarios.
Abstract:Retrieving target objects from unknown, confined spaces remains a challenging task that requires integrated, task-driven active sensing and rearrangement planning. Previous approaches have independently addressed active sensing and rearrangement planning, limiting their practicality in real-world scenarios. This paper presents a new, integrated heuristic-based active sensing and Monte-Carlo Tree Search (MCTS)-based retrieval planning approach. These components provide feedback to one another to actively sense critical, unobserved areas suitable for the retrieval planner to plan a sequence for relocating path-blocking obstacles and a collision-free trajectory for retrieving the target object. We demonstrate the effectiveness of our approach using a robot arm equipped with an in-hand camera in both simulated and real-world confined, cluttered scenarios. Our framework is compared against various state-of-the-art methods. The results indicate that our proposed approach outperforms baseline methods by a significant margin in terms of the success rate, the object rearrangement planning time consumption and the number of planning trials before successfully retrieving the target. Videos can be found at https://youtu.be/tea7I-3RtV0.
Abstract:Mapping and motion planning are two essential elements of robot intelligence that are interdependent in generating environment maps and navigating around obstacles. The existing mapping methods create maps that require computationally expensive motion planning tools to find a path solution. In this paper, we propose a new mapping feature called arrival time fields, which is a solution to the Eikonal equation. The arrival time fields can directly guide the robot in navigating the given environments. Therefore, this paper introduces a new approach called Active Neural Time Fields (Active NTFields), which is a physics-informed neural framework that actively explores the unknown environment and maps its arrival time field on the fly for robot motion planning. Our method does not require any expert data for learning and uses neural networks to directly solve the Eikonal equation for arrival time field mapping and motion planning. We benchmark our approach against state-of-the-art mapping and motion planning methods and demonstrate its superior performance in both simulated and real-world environments with a differential drive robot and a 6 degrees-of-freedom (DOF) robot manipulator. The supplementary videos can be found at https://youtu.be/qTPL5a6pRKk, and the implementation code repository is available at https://github.com/Rtlyc/antfields-demo.
Abstract:Constrained Motion Planning (CMP) aims to find a collision-free path between the given start and goal configurations on the kinematic constraint manifolds. These problems appear in various scenarios ranging from object manipulation to legged-robot locomotion. However, the zero-volume nature of manifolds makes the CMP problem challenging, and the state-of-the-art methods still take several seconds to find a path and require a computationally expansive path dataset for imitation learning. Recently, physics-informed motion planning methods have emerged that directly solve the Eikonal equation through neural networks for motion planning and do not require expert demonstrations for learning. Inspired by these approaches, we propose the first physics-informed CMP framework that solves the Eikonal equation on the constraint manifolds and trains neural function for CMP without expert data. Our results show that the proposed approach efficiently solves various CMP problems in both simulation and real-world, including object manipulation under orientation constraints and door opening with a high-dimensional 6-DOF robot manipulator. In these complex settings, our method exhibits high success rates and finds paths in sub-seconds, which is many times faster than the state-of-the-art CMP methods.
Abstract:Rearrangement planning for object retrieval tasks from confined spaces is a challenging problem, primarily due to the lack of open space for robot motion and limited perception. Several traditional methods exist to solve object retrieval tasks, but they require overhead cameras for perception and a time-consuming exhaustive search to find a solution and often make unrealistic assumptions, such as having identical, simple geometry objects in the environment. This paper presents a neural object retrieval framework that efficiently performs rearrangement planning of unknown, arbitrary objects in confined spaces to retrieve the desired object using a given robot grasp. Our method actively senses the environment with the robot's in-hand camera. It then selects and relocates the non-target objects such that they do not block the robot path homotopy to the target object, thus also aiding an underlying path planner in quickly finding robot motion sequences. Furthermore, we demonstrate our framework in challenging scenarios, including real-world cabinet-like environments with arbitrary household objects. The results show that our framework achieves the best performance among all presented methods and is, on average, two orders of magnitude computationally faster than the best-performing baselines.
Abstract:Language-guided active sensing is a robotics subtask where a robot with an onboard sensor interacts efficiently with the environment via object manipulation to maximize perceptual information, following given language instructions. These tasks appear in various practical robotics applications, such as household service, search and rescue, and environment monitoring. Despite many applications, the existing works do not account for language instructions and have mainly focused on surface sensing, i.e., perceiving the environment from the outside without rearranging it for dense sensing. Therefore, in this paper, we introduce the first language-guided active sensing approach that allows users to observe specific parts of the environment via object manipulation. Our method spatially associates the environment with language instructions, determines the best camera viewpoints for perception, and then iteratively selects and relocates the best view-blocking objects to provide the dense perception of the region of interest. We evaluate our method against different baseline algorithms in simulation and also demonstrate it in real-world confined cabinet-like settings with multiple unknown objects. Our results show that the proposed method exhibits better performance across different metrics and successfully generalizes to real-world complex scenarios.
Abstract:Constrained robot motion planning is a ubiquitous need for robots interacting with everyday environments, but it is a notoriously difficult problem to solve. Many sampled points in a sample-based planner need to be rejected as they fall outside the constraint manifold, or require significant iterative effort to correct. Given this, few solutions exist that present a constraint-satisfying trajectory for robots, in reasonable time and of low path cost. In this work, we present a transformer-based model for motion planning with task space constraints for manipulation systems. Vector Quantized-Motion Planning Transformer (VQ-MPT) is a recent learning-based model that reduces the search space for unconstrained planning for sampling-based motion planners. We propose to adapt a pre-trained VQ-MPT model to reduce the search space for constraint planning without retraining or finetuning the model. We also propose to update the neural network output to move sampling regions closer to the constraint manifold. Our experiments show how VQ-MPT improves planning times and accuracy compared to traditional planners in simulated and real-world environments. Unlike previous learning methods, which require task-related data, our method uses pre-trained neural network models and requires no additional data for training and finetuning the model making this a \textit{one-shot} process. We also tested our method on a physical Franka Panda robot with real-world sensor data, demonstrating the generalizability of our algorithm. We anticipate this approach to be an accessible and broadly useful for transferring learned neural planners to various robotic-environment interaction scenarios.
Abstract:Anytime 3D human pose forecasting is crucial to synchronous real-world human-machine interaction, where the term ``anytime" corresponds to predicting human pose at any real-valued time step. However, to the best of our knowledge, all the existing methods in human pose forecasting perform predictions at preset, discrete time intervals. Therefore, we introduce AnyPose, a lightweight continuous-time neural architecture that models human behavior dynamics with neural ordinary differential equations. We validate our framework on the Human3.6M, AMASS, and 3DPW dataset and conduct a series of comprehensive analyses towards comparison with existing methods and the intersection of human pose and neural ordinary differential equations. Our results demonstrate that AnyPose exhibits high-performance accuracy in predicting future poses and takes significantly lower computational time than traditional methods in solving anytime prediction tasks.
Abstract:Heterogeneous systems manipulation, i.e., manipulating rigid objects via deformable (soft) objects, is an emerging field that remains in its early stages of research. Existing works in this field suffer from limited action and operational space, poor generalization ability, and expensive development. To address these challenges, we propose a universally applicable and effective moving primitive, Iterative Grasp-Pull (IGP), and a sample-based framework, DeRi-IGP, to solve the heterogeneous system manipulation task. The DeRi-IGP framework uses local onboard robots' RGBD sensors to observe the environment, comprising a soft-rigid body system. It then uses this information to iteratively grasp and pull a soft body (e.g., rope) to move the attached rigid body to a desired location. We evaluate the effectiveness of our framework in solving various heterogeneous manipulation tasks and compare its performance with several state-of-the-art baselines. The result shows that DeRi-IGP outperforms other methods by a significant margin. In addition, we also demonstrate the advantage of the large operational space of IGP in the long-distance object acquisition task within both simulated and real environments.