Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Snyder

Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Oct 05, 2025

Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O'Kelly, Anushri Dixit, Anirudha Majumdar

Figure 1 for Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Figure 2 for Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Figure 3 for Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Figure 4 for Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Abstract:Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned $\pi_0$ on a joint distribution of objects and initial conditions, and find that our approach saves over $20-25\%$ of hardware evaluation effort to achieve similar bounds on policy performance.

Via

Access Paper or Ask Questions

Guiding Data Collection via Factored Scaling Curves

May 12, 2025

Lihan Zha, Apurva Badithela, Michael Zhang, Justin Lidard, Jeremy Bao, Emily Zhou, David Snyder, Allen Z. Ren, Dhruv Shah, Anirudha Majumdar

Abstract:Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance varies as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. We evaluate the proposed method through extensive simulated and real-world experiments, across both training-from-scratch and fine-tuning settings, and show that it boosts success rates in real-world tasks in new environments by up to 26% over existing data-collection strategies. We further demonstrate how factored scaling curves can effectively guide data collection using an offline metric, without requiring real-world evaluation at scale.

* Project website: https://factored-data-scaling.github.io

Via

Access Paper or Ask Questions

Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping

Mar 14, 2025

David Snyder, Asher James Hancock, Apurva Badithela, Emma Dixon, Patrick Miller, Rares Andrei Ambrus, Anirudha Majumdar, Masha Itkina, Haruki Nishimura

Abstract:Imitation learning has enabled robots to perform complex, long-horizon tasks in challenging dexterous manipulation settings. As new methods are developed, they must be rigorously evaluated and compared against corresponding baselines through repeated evaluation trials. However, policy comparison is fundamentally constrained by a small feasible sample size (e.g., 10 or 50) due to significant human effort and limited inference throughput of policies. This paper proposes a novel statistical framework for rigorously comparing two policies in the small sample size regime. Prior work in statistical policy comparison relies on batch testing, which requires a fixed, pre-determined number of trials and lacks flexibility in adapting the sample size to the observed evaluation data. Furthermore, extending the test with additional trials risks inducing inadvertent p-hacking, undermining statistical assurances. In contrast, our proposed statistical test is sequential, allowing researchers to decide whether or not to run more trials based on intermediate results. This adaptively tailors the number of trials to the difficulty of the underlying comparison, saving significant time and effort without sacrificing probabilistic correctness. Extensive numerical simulation and real-world robot manipulation experiments show that our test achieves near-optimal stopping, letting researchers stop evaluation and make a decision in a near-minimal number of trials. Specifically, it reduces the number of evaluation trials by up to 40% as compared to state-of-the-art baselines, while preserving the probabilistic correctness and statistical power of the comparison. Moreover, our method is strongest in the most challenging comparison instances (requiring the most evaluation trials); in a multi-task comparison scenario, we save the evaluator more than 200 simulation rollouts.

* 19 pages, 10 figures. In submission

Via

Access Paper or Ask Questions

Privacy-Preserving Map-Free Exploration for Confirming the Absence of a Radioactive Source

Feb 27, 2024

Eric Lepowsky, David Snyder, Alexander Glaser, Anirudha Majumdar

Abstract:Performing an inspection task while maintaining the privacy of the inspected site is a challenging balancing act. In this work, we are motivated by the future of nuclear arms control verification, which requires both a high level of privacy and guaranteed correctness. For scenarios with limitations on sensors and stored information due to the potentially secret nature of observable features, we propose a robotic verification procedure that provides map-free exploration to perform a source verification task without requiring, nor revealing, any task-irrelevant, site-specific information. We provide theoretical guarantees on the privacy and correctness of our approach, validated by extensive simulated and hardware experiments.

* 10 pages, 6 figures, in submission

Via

Access Paper or Ask Questions

Online Learning for Obstacle Avoidance

Jun 14, 2023

David Snyder, Meghan Booker, Nathaniel Simon, Wenhan Xia, Daniel Suo, Elad Hazan, Anirudha Majumdar

Figure 1 for Online Learning for Obstacle Avoidance

Figure 2 for Online Learning for Obstacle Avoidance

Figure 3 for Online Learning for Obstacle Avoidance

Figure 4 for Online Learning for Obstacle Avoidance

Abstract:We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worst-case realizations of uncertainty in the environment or a stationary stochastic model of uncertainty, we propose a method that is efficient to implement and provably grants instance-optimality with respect to perturbations of trajectories generated from an open-loop planner (in the sense of minimizing worst-case regret). The resulting policy adapts online to realizations of uncertainty and provably compares well with the best obstacle avoidance policy in hindsight from a rich class of policies. The method is validated in simulation on a dynamical system environment and compared to baseline open-loop planning and robust Hamilton- Jacobi reachability techniques. Further, it is implemented on a hardware example where a quadruped robot traverses a dense obstacle field and encounters input disturbances due to time delays, model uncertainty, and dynamics nonlinearities.

* 8 + 20 pages, 2 + 10 figures, in submission

Via

Access Paper or Ask Questions

FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Oct 12, 2022

Nathaniel Simon, Allen Z. Ren, Alexander Piqué, David Snyder, Daphne Barretto, Marcus Hultmark, Anirudha Majumdar

Figure 1 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Figure 2 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Figure 3 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Figure 4 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Abstract:Unmanned aerial vehicles (UAVs) are finding use in applications that place increasing emphasis on robustness to external disturbances including extreme wind. However, traditional multirotor UAV platforms do not directly sense wind; conventional flow sensors are too slow, insensitive, or bulky for widespread integration on UAVs. Instead, drones typically observe the effects of wind indirectly through accumulated errors in position or trajectory tracking. In this work, we integrate a novel flow sensor based on micro-electro-mechanical systems (MEMS) hot-wire technology developed in our prior work onto a multirotor UAV for wind estimation. These sensors are omnidirectional, lightweight, fast, and accurate. In order to achieve superior tracking performance in windy conditions, we train a `wind-aware' residual-based controller via reinforcement learning using simulated wind gusts and their aerodynamic effects on the drone. In extensive hardware experiments, we demonstrate the wind-aware controller outperforming two strong `wind-unaware' baseline controllers in challenging windy conditions.

* submitted to ICRA 2023

Via

Access Paper or Ask Questions

Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Feb 11, 2022

Alec Farid, David Snyder, Allen Z. Ren, Anirudha Majumdar

Figure 1 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Figure 2 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Figure 3 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Figure 4 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Abstract:We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a blackbox control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negative errors. In order to achieve this, we utilize techniques from Probably Approximately Correct (PAC)-Bayes generalization theory. In addition, we present novel class-conditional bounds that allow us to tradeoff the relative rates of false-positive vs. false-negative errors. We propose algorithms that train failure predictors (that take as input the history of sensor observations) by minimizing our theoretical error bounds. We demonstrate the resulting approach using extensive simulation and hardware experiments for vision-based navigation with a drone and grasping objects with a robotic manipulator equipped with a wrist-mounted RGB-D camera. These experiments illustrate the ability of our approach to (1) provide strong bounds on failure prediction error rates (that closely match empirical error rates), and (2) improve safety by predicting failures.

Via

Access Paper or Ask Questions

Generating Adversarial Disturbances for Controller Verification

Dec 12, 2020

Udaya Ghai, David Snyder, Anirudha Majumdar, Elad Hazan

Figure 1 for Generating Adversarial Disturbances for Controller Verification

Figure 2 for Generating Adversarial Disturbances for Controller Verification

Figure 3 for Generating Adversarial Disturbances for Controller Verification

Abstract:We consider the problem of generating maximally adversarial disturbances for a given controller assuming only blackbox access to it. We propose an online learning approach to this problem that adaptively generates disturbances based on control inputs chosen by the controller. The goal of the disturbance generator is to minimize regret versus a benchmark disturbance-generating policy class, i.e., to maximize the cost incurred by the controller as well as possible compared to the best possible disturbance generator in hindsight (chosen from a benchmark policy class). In the setting where the dynamics are linear and the costs are quadratic, we formulate our problem as an online trust region (OTR) problem with memory and present a new online learning algorithm (MOTR) for this problem. We prove that this method competes with the best disturbance generator in hindsight (chosen from a rich class of benchmark policies that includes linear-dynamical disturbance generating policies). We demonstrate our approach on two simulated examples: (i) synthetically generated linear systems, and (ii) generating wind disturbances for the popular PX4 controller in the AirSim simulator. On these examples, we demonstrate that our approach outperforms several baseline approaches, including $H_{\infty}$ disturbance generation and gradient-based methods.

Via

Access Paper or Ask Questions

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

May 02, 2020

Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj(+11 more)

Figure 1 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Figure 2 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Figure 3 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Figure 4 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Abstract:Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.

Via

Access Paper or Ask Questions

Probing the Information Encoded in X-vectors

Sep 30, 2019

Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

Figure 1 for Probing the Information Encoded in X-vectors

Figure 2 for Probing the Information Encoded in X-vectors

Figure 3 for Probing the Information Encoded in X-vectors

Figure 4 for Probing the Information Encoded in X-vectors

Abstract:Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.

* Accepted at IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019

Via

Access Paper or Ask Questions