Abstract:We present a novel mission-planning strategy for heterogeneous multi-robot teams, taking into account the specific constraints and capabilities of each robot. Our approach employs hierarchical trees to systematically break down complex missions into manageable sub-tasks. We develop specialized APIs and tools, which are utilized by Large Language Models (LLMs) to efficiently construct these hierarchical trees. Once the hierarchical tree is generated, it is further decomposed to create optimized schedules for each robot, ensuring adherence to their individual constraints and capabilities. We demonstrate the effectiveness of our framework through detailed examples covering a wide range of missions, showcasing its flexibility and scalability.
Abstract:The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warnings are mainly generated in a one-shot manner without modeling the ego driver's reactions and surrounding objects, which reduces the flexibility and generality of the system over different scenarios. Additionally, the triggering conditions of warning are mostly rule-based threshold-checking given the current state, which lacks the prediction of the potential risk in a sufficiently long future horizon. In this work, we study the problem of optimally generating driver warnings by considering the interactions among the generated warning, the driver behavior, and the states of ego and surrounding vehicles on a long horizon. The warning generation problem is formulated as a partially observed Markov decision process (POMDP). An optimal warning generation framework is proposed as a solution to the proposed POMDP. The simulation experiments demonstrate the superiority of the proposed solution to the existing warning generation methods.
Abstract:We study the problem of estimating the body movements of a camera wearer from egocentric videos. Current methods for ego-body pose estimation rely on temporally dense sensor data, such as IMU measurements from spatially sparse body parts like the head and hands. However, we propose that even temporally sparse observations, such as hand poses captured intermittently from egocentric videos during natural or periodic hand movements, can effectively constrain overall body motion. Naively applying diffusion models to generate full-body pose from head pose and sparse hand pose leads to suboptimal results. To overcome this, we develop a two-stage approach that decomposes the problem into temporal completion and spatial completion. First, our method employs masked autoencoders to impute hand trajectories by leveraging the spatiotemporal correlations between the head pose sequence and intermittent hand poses, providing uncertainty estimates. Subsequently, we employ conditional diffusion models to generate plausible full-body motions based on these temporally dense trajectories of the head and hands, guided by the uncertainty estimates from the imputation. The effectiveness of our method was rigorously tested and validated through comprehensive experiments conducted on various HMD setup with AMASS and Ego-Exo4D datasets.
Abstract:Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction model interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.
Abstract:The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers working on visual scene understanding and related fields. Further, we introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset and demonstrate performance with quantitative evaluations.
Abstract:Effective understanding of dynamically evolving multiagent interactions is crucial to capturing the underlying behavior of agents in social systems. It is usually challenging to observe these interactions directly, and therefore modeling the latent interactions is essential for realizing the complex behaviors. Recent work on Dynamic Neural Relational Inference (DNRI) captures explicit inter-agent interactions at every step. However, prediction at every step results in noisy interactions and lacks intrinsic interpretability without post-hoc inspection. Moreover, it requires access to ground truth annotations to analyze the predicted interactions, which are hard to obtain. This paper introduces DIDER, Discovering Interpretable Dynamically Evolving Relations, a generic end-to-end interaction modeling framework with intrinsic interpretability. DIDER discovers an interpretable sequence of inter-agent interactions by disentangling the task of latent interaction prediction into sub-interaction prediction and duration estimation. By imposing the consistency of a sub-interaction type over an extended time duration, the proposed framework achieves intrinsic interpretability without requiring any post-hoc inspection. We evaluate DIDER on both synthetic and real-world datasets. The experimental results demonstrate that modeling disentangled and interpretable dynamic relations improves performance on trajectory forecasting tasks.
Abstract:Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The pseudo labels allow us to incorporate arbitrary domain knowledge on interaction. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.
Abstract:This paper presents a novel design of a crawler robot which is capable of transforming its chassis from an Omni crawler mode to a large-sized wheel mode using a novel mechanism. The transformation occurs without any additional actuators. Interestingly the robot can transform into a large diameter and small width wheel which enhances its maneuverability like small turning radius and fast/efficient locomotion. This paper contributes on improving the locomotion mode of previously developed hybrid compliant omnicrawler robot CObRaSO. In addition to legged and tracked mechanism, CObRaSO can now display large wheel mode which contributes to its locomotion capabilities. Mechanical design of the robot has been explained in a detailed manner in this paper and also the transforming experiment and torque analysis has been shown clearly
Abstract:This paper presents a novel design of an Omnidirectional bendable Omnicrawler module- CObRaSO. Along with the longitudinal crawling and sideways rolling motion, the performance of the OmniCrawler is further enhanced by the introduction of Omnidirectional bending within the module, which is the key contribution of this paper. The Omnidirectional bending is achieved by an arrangement of two independent 1-DOF joints aligned at 90? w.r.t each other. The unique characteristic of this module is its ability to crawl in Omnidirectionally bent configuration which is achieved by a novel design of a 2-DOF roller chain and a backbone of a hybrid structure of a soft-rigid material. This hybrid structure provides compliant pathways for the lug-chain assembly to passively conform with the orientation of the module and crawl in Omnidirectional bent configuration, which makes this module one of its kind. Furthermore, we show that the unique modular design of CObRaSO unveils its versatility by achieving active compliance on an uneven surface, demonstrating its applications in different robotic platforms (an in-pipeline robot, Quadruped and snake robot) and exhibiting hybrid locomotion modes in various configurations of the robots. The mechanism and mobility characteristics of the proposed module have been verified with the aid of simulations and experiments on real robot prototype.
Abstract:This paper discusses the design of a novel compliant in-pipe climbing modular robot for small diameter pipes. The robot consists of a kinematic chain of 3 OmniCrawler modules with a link connected in between 2 adjacent modules via compliant joints. While the tank-like crawler mechanism provides good traction on low friction surfaces, its circular cross-section makes it holonomic. The holonomic motion assists it to re-align in a direction to avoid obstacles during motion as well as overcome turns with a minimal energy posture. Additionally, the modularity enables it to negotiate T-junction without motion singularity. The compliance is realized using 4 torsion springs incorporated in joints joining 3 modules with 2 links. For a desirable pipe diameter (\text{\O} 75mm), the springs' stiffness values are obtained by formulating a constraint optimization problem which has been simulated in ADAMS MSC and further validated on a real robot prototype. In order to negotiate smooth vertical bends and friction coefficient variations in pipes, the design was later modified by replacing springs with series elastic actuators (SEA) at 2 of the 4 joints.