Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hengyuan Hu

Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation

May 06, 2025

Hengyuan Hu, Aniket Das, Dorsa Sadigh, Nima Anari

Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. This general insight enables near-black-box adaptation of various performance optimization techniques from autoregressive models to the diffusion setting. To demonstrate this, we introduce \emph{Autospeculative Decoding} (ASD), an extension of the widely used speculative decoding algorithm to DDPMs that does not require any auxiliary draft models. Our theoretical analysis shows that ASD achieves a $\tilde{O} (K^{\frac{1}{3}})$ parallel runtime speedup over the $K$ step sequential DDPM. We also demonstrate that a practical implementation of autospeculative decoding accelerates DDPM inference significantly in various domains.

Via

Access Paper or Ask Questions

What's the Move? Hybrid Imitation Learning via Salient Points

Dec 06, 2024

Priya Sundaresan, Hengyuan Hu, Quan Vuong, Jeannette Bohg, Dorsa Sadigh

Figure 1 for What's the Move? Hybrid Imitation Learning via Salient Points

Figure 2 for What's the Move? Hybrid Imitation Learning via Salient Points

Figure 3 for What's the Move? Hybrid Imitation Learning via Salient Points

Figure 4 for What's the Move? Hybrid Imitation Learning via Salient Points

Abstract:While imitation learning (IL) offers a promising framework for teaching robots various behaviors, learning complex tasks remains challenging. Existing IL policies struggle to generalize effectively across visual and spatial variations even for simple tasks. In this work, we introduce SPHINX: Salient Point-based Hybrid ImitatioN and eXecution, a flexible IL policy that leverages multimodal observations (point clouds and wrist images), along with a hybrid action space of low-frequency, sparse waypoints and high-frequency, dense end effector movements. Given 3D point cloud observations, SPHINX learns to infer task-relevant points within a point cloud, or salient points, which support spatial generalization by focusing on semantically meaningful features. These salient points serve as anchor points to predict waypoints for long-range movement, such as reaching target poses in free-space. Once near a salient point, SPHINX learns to switch to predicting dense end-effector movements given close-up wrist images for precise phases of a task. By exploiting the strengths of different input modalities and action representations for different manipulation phases, SPHINX tackles complex tasks in a sample-efficient, generalizable manner. Our method achieves 86.7% success across 4 real-world and 2 simulated tasks, outperforming the next best state-of-the-art IL baseline by 41.1% on average across 440 real world trials. SPHINX additionally generalizes to novel viewpoints, visual distractors, spatial arrangements, and execution speeds with a 1.7x speedup over the most competitive baseline. Our website (http://sphinx-manip.github.io) provides open-sourced code for data collection, training, and evaluation, along with supplementary videos.

Via

Access Paper or Ask Questions

Imitation Bootstrapped Reinforcement Learning

Nov 20, 2023

Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh

Figure 1 for Imitation Bootstrapped Reinforcement Learning

Figure 2 for Imitation Bootstrapped Reinforcement Learning

Figure 3 for Imitation Bootstrapped Reinforcement Learning

Figure 4 for Imitation Bootstrapped Reinforcement Learning

Abstract:Despite the considerable potential of reinforcement learning (RL), robotics control tasks predominantly rely on imitation learning (IL) owing to its better sample efficiency. However, given the high cost of collecting extensive demonstrations, RL is still appealing if it can utilize limited imitation data for efficient autonomous self-improvement. Existing RL methods that utilize demonstrations either initialize the replay buffer with demonstrations and oversample them during RL training, which does not benefit from the generalization potential of modern IL methods, or pretrain the RL policy with IL on the demonstrations, which requires additional mechanisms to prevent catastrophic forgetting during RL fine-tuning. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework that first trains an IL policy on a limited number of demonstrations and then uses it to propose alternative actions for both online exploration and target value bootstrapping. IBRL achieves SoTA performance and sample efficiency on 7 challenging sparse reward continuous control tasks in simulation while learning directly from pixels. As a highlight of our method, IBRL achieves $6.4\times$ higher success rate than RLPD, a strong method that combines the idea of oversampling demonstrations with modern RL improvements, under the budget of 10 demos and 100K interactions in the challenging PickPlaceCan task in the Robomimic benchmark.

Via

Access Paper or Ask Questions

Toward Grounded Social Reasoning

Jun 14, 2023

Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh

Figure 1 for Toward Grounded Social Reasoning

Figure 2 for Toward Grounded Social Reasoning

Figure 3 for Toward Grounded Social Reasoning

Figure 4 for Toward Grounded Social Reasoning

Abstract:Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not socially appropriate to disassemble the sports car and put it away as part of the "tidying". How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable social reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and *actively gather information from the environment* that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded social reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/groundedsocialreasoning.

Via

Access Paper or Ask Questions

The Update Equivalence Framework for Decision-Time Planning

Apr 25, 2023

Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Figure 1 for The Update Equivalence Framework for Decision-Time Planning

Figure 2 for The Update Equivalence Framework for Decision-Time Planning

Figure 3 for The Update Equivalence Framework for Decision-Time Planning

Figure 4 for The Update Equivalence Framework for Decision-Time Planning

Abstract:The process of revising (or constructing) a policy immediately prior to execution -- known as decision-time planning -- is key to achieving superhuman performance in perfect-information settings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to superhuman performance in poker. However, these methods requires considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms simulate updates of synchronous learning algorithms. This framework enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in settings with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.

Via

Access Paper or Ask Questions

Language Instructed Reinforcement Learning for Human-AI Coordination

Apr 13, 2023

Hengyuan Hu, Dorsa Sadigh

Figure 1 for Language Instructed Reinforcement Learning for Human-AI Coordination

Figure 2 for Language Instructed Reinforcement Learning for Human-AI Coordination

Figure 3 for Language Instructed Reinforcement Learning for Human-AI Coordination

Figure 4 for Language Instructed Reinforcement Learning for Human-AI Coordination

Abstract:One of the fundamental quests of AI is to produce agents that coordinate well with humans. This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. We use pretrained large language models to generate a prior policy conditioned on the human instruction and use the prior to regularize the RL objective. This leads to the RL agent converging to equilibria that are aligned with human preferences. We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment as well as the challenging Hanabi benchmark. Finally, we show that knowing the language instruction significantly boosts human-AI coordination performance in human evaluations in Hanabi.

Via

Access Paper or Ask Questions

Human-AI Coordination via Human-Regularized Search and Learning

Oct 11, 2022

Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown

Figure 1 for Human-AI Coordination via Human-Regularized Search and Learning

Figure 2 for Human-AI Coordination via Human-Regularized Search and Learning

Figure 3 for Human-AI Coordination via Human-Regularized Search and Learning

Abstract:We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle out-of-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.

Via

Access Paper or Ask Questions

K-level Reasoning for Zero-Shot Coordination in Hanabi

Jul 14, 2022

Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster

Figure 1 for K-level Reasoning for Zero-Shot Coordination in Hanabi

Figure 2 for K-level Reasoning for Zero-Shot Coordination in Hanabi

Figure 3 for K-level Reasoning for Zero-Shot Coordination in Hanabi

Figure 4 for K-level Reasoning for Zero-Shot Coordination in Hanabi

Abstract:The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together. However, optimal SP policies commonly contain arbitrary conventions ("handshakes") and are not compatible with other, independently trained agents or humans. This latter desiderata was recently formalized by Hu et al. 2020 as the zero-shot coordination (ZSC) setting and partially addressed with their Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance in the card game Hanabi. OP assumes access to the symmetries of the environment and prevents agents from breaking these in a mutually incompatible way during training. However, as the authors point out, discovering symmetries for a given environment is a computationally hard problem. Instead, we show that through a simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006, synchronously training all levels, we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi, including when paired with a human-like proxy bot. We also introduce a new method, synchronous-k-level reasoning with a best response (SyKLRBR), which further improves performance on our synchronous KLR by co-training a best response.

* Advances in Neural Information Processing Systems 2021. Vol 34. 8215--8228
* Neurips 2021. 15 pages. 2 figures

Via

Access Paper or Ask Questions

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Sep 30, 2021

Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown

Figure 1 for Scalable Online Planning via Reinforcement Learning Fine-Tuning

Figure 2 for Scalable Online Planning via Reinforcement Learning Fine-Tuning

Figure 3 for Scalable Online Planning via Reinforcement Learning Fine-Tuning

Figure 4 for Scalable Online Planning via Reinforcement Learning Fine-Tuning

Abstract:Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.

Via

Access Paper or Ask Questions

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Jun 16, 2021

Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster

Figure 1 for Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Figure 2 for Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Figure 3 for Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Figure 4 for Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Abstract:Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games. However, one major limitation of prior search approaches for partially observable environments is that the computational cost scales poorly with the amount of hidden information. In this paper we present \emph{Learned Belief Search} (LBS), a computationally efficient search procedure for partially observable environments. Rather than maintaining an exact belief distribution, LBS uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. In multi-agent settings, LBS uses a novel public-private model architecture for underlying policies in order to efficiently evaluate these policies during rollouts. In the benchmark domain of Hanabi, LBS can obtain 55% ~ 91% of the benefit of exact search while reducing compute requirements by $35.8 \times$ ~ $4.6 \times$, allowing it to scale to larger settings that were inaccessible to previous search methods.

Via

Access Paper or Ask Questions