Abstract:We present a novel approach for long-term human trajectory prediction, which is essential for long-horizon robot planning in human-populated environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood (NLL) and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged baselines for a time horizon of 60s.
Abstract:In challenging terrains, constructing structures such as antennas and cable-car masts often requires the use of helicopters to transport loads via ropes. The swinging of the load, exacerbated by wind, impairs positioning accuracy, therefore necessitating precise manual placement by ground crews. This increases costs and risk of injuries. Challenging this paradigm, we present Geranos: a specialized multirotor Unmanned Aerial Vehicle (UAV) designed to enhance aerial transportation and assembly. Geranos demonstrates exceptional prowess in accurately positioning vertical poles, achieving this through an innovative integration of load transport and precision. Its unique ring design mitigates the impact of high pole inertia, while a lightweight two-part grasping mechanism ensures secure load attachment without active force. With four primary propellers countering gravity and four auxiliary ones enhancing lateral precision, Geranos achieves comprehensive position and attitude control around hovering. Our experimental demonstration mimicking antenna/cable-car mast installations showcases Geranos ability in stacking poles (3 kg, 2 m long) with remarkable sub-5 cm placement accuracy, without the need of human manual intervention.
Abstract:Most object-level mapping systems in use today make use of an upstream learned object instance segmentation model. If we want to teach them about a new object or segmentation class, we need to build a large dataset and retrain the system. To build spatial AI systems that can quickly be taught about new objects, we need to effectively solve the problem of single-shot object detection, instance segmentation and re-identification. So far there is neither a method fulfilling all of these requirements in unison nor a benchmark that could be used to test such a method. Addressing this, we propose ISAR, a benchmark and baseline method for single- and few-shot object Instance Segmentation And Re-identification, in an effort to accelerate the development of algorithms that can robustly detect, segment, and re-identify objects from a single or a few sparse training examples. We provide a semi-synthetic dataset of video sequences with ground-truth semantic annotations, a standardized evaluation pipeline, and a baseline method. Our benchmark aligns with the emerging research trend of unifying Multi-Object Tracking, Video Object Segmentation, and Re-identification.