Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antonio Loquercio

University of California Berkeley

End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

Nov 08, 2024

Dylan Goetting, Himanshu Gaurav Singh, Antonio Loquercio

Figure 1 for End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

Figure 2 for End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

Figure 3 for End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

Figure 4 for End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

Abstract:We present VLMnav, an embodied framework to transform a Vision-Language Model (VLM) into an end-to-end navigation policy. In contrast to prior work, we do not rely on a separation between perception, planning, and control; instead, we use a VLM to directly select actions in one step. Surprisingly, we find that a VLM can be used as an end-to-end policy zero-shot, i.e., without any fine-tuning or exposure to navigation data. This makes our approach open-ended and generalizable to any downstream navigation task. We run an extensive study to evaluate the performance of our approach in comparison to baseline prompting methods. In addition, we perform a design analysis to understand the most impactful design decisions. Visual examples and code for our project can be found at https://jirl-upenn.github.io/VLMnav/

Via

Access Paper or Ask Questions

Hand-Object Interaction Pretraining from Videos

Sep 12, 2024

Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

Figure 1 for Hand-Object Interaction Pretraining from Videos

Figure 2 for Hand-Object Interaction Pretraining from Videos

Figure 3 for Hand-Object Interaction Pretraining from Videos

Figure 4 for Hand-Object Interaction Pretraining from Videos

Abstract:We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: \url{https://hgaurav2k.github.io/hop/}.

Via

Access Paper or Ask Questions

Tactile-Augmented Radiance Fields

May 07, 2024

Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens

Abstract:We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features. We use these insights to register touch signals to a captured visual scene, and to train a conditional diffusion model that, provided with an RGB-D image rendered from a neural radiance field, generates its corresponding tactile signal. To evaluate our approach, we collect a dataset of TaRFs. This dataset contains more touch samples than previous real-world datasets, and it provides spatially aligned visual signals for each captured touch signal. We demonstrate the accuracy of our cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. Project page: https://dou-yiming.github.io/TaRF

* CVPR 2024, Project page: https://dou-yiming.github.io/TaRF, Code: https://github.com/Dou-Yiming/TaRF/

Via

Access Paper or Ask Questions

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Apr 15, 2024

Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

Figure 1 for EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Figure 2 for EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Figure 3 for EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Figure 4 for EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Abstract:Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets.

* https://www.amirbar.net/egopet

Via

Access Paper or Ask Questions

Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts

Nov 02, 2023

Huang Huang, Satvik Sharma, Antonio Loquercio, Anastasios Angelopoulos, Ken Goldberg, Jitendra Malik

Abstract:This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input, which we define as conformal policy learning, that allows robots to detect distribution shifts with formal statistical guarantees. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics, e.g. safety or speed, or directly augmenting a policy observation with a quantile and training it with reinforcement learning. Theoretically, we show that such policies achieve the formal convergence guarantees in finite time. In addition, we thoroughly evaluate their advantages and limitations on two compelling use cases: simulated autonomous driving and active perception with a physical quadruped. Empirical results demonstrate that our approach outperforms five baselines. It is also the simplest of the baseline strategies besides one ablation. Being easy to use, flexible, and with formal guarantees, our work demonstrates how conformal prediction can be an effective tool for sensorimotor learning under uncertainty.

* Conformal Policy Learning

Via

Access Paper or Ask Questions

A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Aug 31, 2023

Spencer Teetaert, Wenda Zhao, Niu Xinyuan, Hashir Zahir, Huiyu Leong, Michel Hidalgo, Gerardo Puga, Tomas Lorente, Nahuel Espinosa, John Alejandro Duarte Carrasco(+14 more)

Figure 1 for A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Figure 2 for A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Figure 3 for A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Figure 4 for A Remote Sim2real Aerial Competition: Fostering Reproducibility and Solutions' Diversity in Robotics Challenges

Abstract:Shared benchmark problems have historically been a fundamental driver of progress for scientific communities. In the context of academic conferences, competitions offer the opportunity to researchers with different origins, backgrounds, and levels of seniority to quantitatively compare their ideas. In robotics, a hot and challenging topic is sim2real-porting approaches that work well in simulation to real robot hardware. In our case, creating a hybrid competition with both simulation and real robot components was also dictated by the uncertainties around travel and logistics in the post-COVID-19 world. Hence, this article motivates and describes an aerial sim2real robot competition that ran during the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, from the specification of the competition task, to the details of the software infrastructure supporting simulation and real-life experiments, to the approaches of the top-placed teams and the lessons learned by participants and organizers.

* 13 pages, 16 figures, 4 tables

Via

Access Paper or Ask Questions

Learning Vision-based Pursuit-Evasion Robot Policies

Aug 30, 2023

Andrea Bajcsy, Antonio Loquercio, Ashish Kumar, Jitendra Malik

Abstract:Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent intent uncertainty. In this paper, we transform this intractable problem into a supervised learning problem, where a fully-observable robot policy generates supervision for a partially-observable one. We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy. We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild. Despite all the challenges, the sensing constraints bring about creativity: the robot is pushed to gather information when uncertain, predict intent from noisy measurements, and anticipate in order to intercept. Project webpage: https://abajcsy.github.io/vision-based-pursuit/

* Includes Supplementary. Project webpage at https://abajcsy.github.io/vision-based-pursuit/

Via

Access Paper or Ask Questions

Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight

Jul 12, 2023

Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laengle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio(+1 more)

Abstract:Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open-source and open-hardware and supports both model-based and neural-network--based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, GPU-accelerated compute hardware for real-time perception and neural-network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both model-based and neural-network--based controllers. Our demonstrators include trajectory tracking at up to 5g and 70 km/h in a motion-capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Finally, we demonstrate its use for hardware-in-the-loop simulation in virtual-reality environments. Thanks to its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research.

* Science Robotics Vol. 7, Issue 67, 2022
* 14 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

May 02, 2023

Huang Huang, Antonio Loquercio, Ashish Kumar, Neerja Thakkar, Ken Goldberg, Jitendra Malik

Figure 1 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 2 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 3 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 4 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Abstract:Is a manipulator on a legged robot a liability or an asset for locomotion? Prior works mainly designed specific controllers to account for the added payload and inertia from a manipulator. In contrast, biological systems typically benefit from additional limbs, which can simplify postural control. For instance, cats use their tails to enhance the stability of their bodies and prevent falls under disturbances. In this work, we show that a manipulator can be an important asset for maintaining balance during locomotion. To do so, we train a sensorimotor policy using deep reinforcement learning to create a synergy between the robot's limbs. This policy enables the robot to maintain stability despite large disturbances. However, learning such a controller can be quite challenging. To account for these challenges, we propose a stage-wise training procedure to learn complex behaviors. Our proposed method decomposes this complex task into three stages and then incrementally learns these tasks to arrive at a single policy capable of solving the final control task, achieving a success rate up to 2.35 times higher than baselines in simulation. We deploy our learned policy in the real world and show stability during locomotion under strong disturbances.

Via

Access Paper or Ask Questions

Autonomous Drone Racing: A Survey

Jan 05, 2023

Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, Davide Scaramuzza

Abstract:Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivity and further strengthens their use case. One proxy for developing algorithms used in high-speed navigation is the task of autonomous drone racing, where researchers program drones to fly through a sequence of gates and avoid obstacles as quickly as possible using onboard sensors and limited computational power. Speeds and accelerations exceed over 80 kph and 4 g respectively, raising significant challenges across perception, planning, control, and state estimation. To achieve maximum performance, systems require real-time algorithms that are robust to motion blur, high dynamic range, model uncertainties, aerodynamic disturbances, and often unpredictable opponents. This survey covers the progression of autonomous drone racing across model-based and learning-based approaches. We provide an overview of the field, its evolution over the years, and conclude with the biggest challenges and open questions to be faced in the future.

* 20 pages, submitted to T-RO January 3rd, 2022

Via

Access Paper or Ask Questions