Abstract:Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable simulators to identify world models are incapable of jointly optimizing the shape, appearance, and physical properties of the scene. In this work, we introduce a novel object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based object representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of system identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence.
Abstract:Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable simulators to identify world models are incapable of jointly optimizing the shape, appearance, and physical properties of the scene. In this work, we introduce a novel object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based object representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of benchmarking system identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only a few partial observations.
Abstract:Two-view pose estimation is essential for map-free visual relocalization and object pose tracking tasks. However, traditional matching methods suffer from time-consuming robust estimators, while deep learning-based pose regressors only cater to camera-to-world pose estimation, lacking generalizability to different image sizes and camera intrinsics. In this paper, we propose SRPose, a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios. SRPose consists of a sparse keypoint detector, an intrinsic-calibration position encoder, and promptable prior knowledge-guided attention layers. Given two RGB images of a fixed scene or a moving object, SRPose estimates the relative camera or 6D object pose transformation. Extensive experiments demonstrate that SRPose achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed, showing generalizability to both scenarios. It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
Abstract:We study the problem of Trajectory Optimization (TO) for a general class of stiff and constrained dynamic systems. We establish a set of mild assumptions, under which we show that TO converges numerically stably to a locally optimal and feasible solution up to arbitrary user-specified error tolerance. Our key observation is that all prior works use SQP as a black-box solver, where a TO problem is formulated as a Nonlinear Program (NLP) and the underlying SQP solver is not allowed to modify the NLP. Instead, we propose a white-box TO solver, where the SQP solver is informed with characteristics of the objective function and the dynamic system. It then uses these characteristics to derive approximate dynamic systems and customize the discretization schemes.
Abstract:Extensive research has been devoted to the field of multi-agent navigation. Recently, there has been remarkable progress attributed to the emergence of learning-based techniques with substantially elevated intelligence and realism. Nonetheless, prevailing learned models face limitations in terms of scalability and effectiveness, primarily due to their agent-centric nature, i.e., the learned neural policy is individually deployed on each agent. Inspired by the efficiency observed in real-world traffic networks, we present an environment-centric navigation policy. Our method learns a set of traffic rules to coordinate a vast group of unintelligent agents that possess only basic collision-avoidance capabilities. Our method segments the environment into distinct blocks and parameterizes the traffic rule using a Graph Recurrent Neural Network (GRNN) over the block network. Each GRNN node is trained to modulate the velocities of agents as they traverse through. Using either Imitation Learning (IL) or Reinforcement Learning (RL) schemes, we demonstrate the efficacy of our neural traffic rules in resolving agent congestion, closely resembling real-world traffic regulations. Our method handles up to $240$ agents at real-time and generalizes across diverse agent and environment configurations.
Abstract:Finding robot poses and trajectories represents a foundational aspect of robot motion planning. Despite decades of research, efficiently and robustly addressing these challenges is still difficult. Existing approaches are often plagued by various limitations, such as intricate geometric approximations, violations of collision constraints, or slow first-order convergence. In this paper, we introduce two novel optimization formulations that offer provable robustness, achieving second-order convergence while requiring only a convex approximation of the robot's links and obstacles. Our first method, known as the Explicit Collision Barrier (ECB) method, employs a barrier function to guarantee separation between convex objects. ECB uses an efficient matrix factorization technique, enabling a second-order Newton's method with an iterative complexity linear in the number of separating planes. Our second method, referred to as the Implicit Collision Barrier (ICB) method, further transforms the separating planes into implicit functions of robot poses. We show such an implicit objective function is twice-differentiable, with derivatives evaluated at a linear complexity. To assess the effectiveness of our approaches, we conduct a comparative study with a first-order baseline algorithm across six testing scenarios. Our results unequivocally justify that our method exhibits significantly faster convergence rates compared to the baseline algorithm.
Abstract:Deformable robots are notoriously difficult to model or control due to its high-dimensional configuration spaces. Direct trajectory optimization suffers from the curse-of-dimensionality and incurs a high computational cost, while learning-based controller optimization methods are sensitive to hyper-parameter tuning. To overcome these limitations, we hypothesize that high fidelity soft robots can be both simulated and controlled by restricting to low-dimensional spaces. Under such assumption, we propose a two-stage algorithm to identify such simulation- and control-spaces. Our method first identifies the so-called simulation-space that captures the salient deformation modes, to which the robot's governing equation is restricted. We then identify the control-space, to which control signals are restricted. We propose a multi-fidelity Riemannian Bayesian bilevel optimization to identify task-specific control spaces. We show that the dimension of control-space can be less than $10$ for a high-DOF soft robot to accomplish walking and swimming tasks, allowing low-dimensional MPC controllers to be applied to soft robots with tractable computational complexity.
Abstract:2D irregular shape packing is a necessary step to arrange UV patches of a 3D model within a texture atlas for memory-efficient appearance rendering in computer graphics. Being a joint, combinatorial decision-making problem involving all patch positions and orientations, this problem has well-known NP-hard complexity. Prior solutions either assume a heuristic packing order or modify the upstream mesh cut and UV mapping to simplify the problem, which either limits the packing ratio or incurs robustness or generality issues. Instead, we introduce a learning-assisted 2D irregular shape packing method that achieves a high packing quality with minimal requirements from the input. Our method iteratively selects and groups subsets of UV patches into near-rectangular super patches, essentially reducing the problem to bin-packing, based on which a joint optimization is employed to further improve the packing ratio. In order to efficiently deal with large problem instances with hundreds of patches, we train deep neural policies to predict nearly rectangular patch subsets and determine their relative poses, leading to linear time scaling with the number of patches. We demonstrate the effectiveness of our method on three datasets for UV packing, where our method achieves a higher packing ratio over several widely used baselines with competitive computational speed.
Abstract:We present a lightweight, decentralized algorithm for navigating multiple nonholonomic agents through challenging environments with narrow passages. Our key idea is to allow agents to yield to each other in large open areas instead of narrow passages, to increase the success rate of conventional decentralized algorithms. At pre-processing time, our method computes a medial axis for the freespace. A reference trajectory is then computed and projected onto the medial axis for each agent. During run time, when an agent senses other agents moving in the opposite direction, our algorithm uses the medial axis to estimate a Point of Impact (POI) as well as the available area around the POI. If the area around the POI is not large enough for yielding behaviors to be successful, we shift the POI to nearby large areas by modulating the agent's reference trajectory and traveling speed. We evaluate our method on a row of 4 environments with up to 15 robots, and we find our method incurs a marginal computational overhead of 10-30 ms on average, achieving real-time performance. Afterward, our planned reference trajectories can be tracked using local navigation algorithms to achieve up to a $100\%$ higher success rate over local navigation algorithms alone.
Abstract:We present a semi-infinite program (SIP) solver for trajectory optimizations of general articulated robots. These problems are more challenging than standard Nonlinear Program (NLP) by involving an infinite number of non-convex, collision constraints. Prior SIP solvers based on constraint sampling cannot guarantee the satisfaction of all constraints. Instead, our method uses a conservative bound on articulated body motions to ensure the solution feasibility throughout the optimization procedure. We further use subdivision to adaptively reduce the error in conservative motion estimation. Combined, we prove that our SIP solver guarantees feasibility while approaches the critical point of SIP problems up to arbitrary user-provided precision. We have verified our method on a row of trajectory optimization problems involving industrial robot arms and UAVs, where our method can generate collision-free, locally optimal trajectories within a couple minutes.