Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Heng Dong

Enhancing Decision-Making of Large Language Models via Actor-Critic

Jun 04, 2025

Heng Dong, Kefei Duan, Chongjie Zhang

Abstract:Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks, yet they encounter challenges in complex decision-making scenarios that require long-term reasoning and alignment with high-level objectives. Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes, leading to sub-optimal decisions. This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with long-term action evaluations in a principled and scalable way. Our approach addresses two key challenges: (1) extracting robust action evaluations by computing Q-values via token logits associated with positive/negative outcomes, enhanced by future trajectory rollouts and reasoning; and (2) enabling efficient policy improvement through a gradient-free mechanism. Experiments across diverse environments -- including high-level decision-making (ALFWorld), low-level action spaces (BabyAI-Text), and large action spaces (WebShop) -- demonstrate the framework's generality and superiority over state-of-the-art methods. Notably, our approach achieves competitive performance using 7B/8B parameter LLMs, even outperforming baseline methods employing GPT-4 in complex tasks. These results underscore the potential of integrating structured policy optimization with LLMs' intrinsic knowledge to advance decision-making capabilities in multi-step environments.

* Forty-second International Conference on Machine Learning (ICML 2025)

Via

Access Paper or Ask Questions

Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards

Feb 18, 2025

Xinyi Yang, Liang Zeng, Heng Dong, Chao Yu, Xiaoran Wu, Huazhong Yang, Yu Wang, Milind Tambe, Tonghan Wang

Abstract:As humans increasingly share environments with diverse agents powered by RL, LLMs, and beyond, the ability to explain their policies in natural language will be vital for reliable coexistence. In this paper, we build a model-agnostic explanation generator based on an LLM. The technical novelty is that the rewards for training this LLM are generated by a generative flow matching model. This model has a specially designed structure with a hidden layer merged with an LLM to harness the linguistic cues of explanations into generating appropriate rewards. Experiments on both RL and LLM tasks demonstrate that our method can generate dense and effective rewards while saving on expensive human feedback; it thus enables effective explanations and even improves the accuracy of the decisions in original tasks.

Via

Access Paper or Ask Questions

On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

Oct 17, 2024

Tonghan Wang, Heng Dong, Yanchen Jiang, David C. Parkes, Milind Tambe

Abstract:Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to address PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, the shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound the deviation by constructing a surrogate linear regression model that approximates the local behavior of diffusion models. With this bound, we propose a composite diffusion process iterating over agents with theoretical convergence guarantees to the true state.

Via

Access Paper or Ask Questions

Underwater Acoustic Signal Denoising Algorithms: A Survey of the State-of-the-art

Jul 18, 2024

Ruobin Gao, Maohan Liang, Heng Dong, Xuewen Luo, P. N. Suganthan

Figure 1 for Underwater Acoustic Signal Denoising Algorithms: A Survey of the State-of-the-art

Figure 2 for Underwater Acoustic Signal Denoising Algorithms: A Survey of the State-of-the-art

Figure 3 for Underwater Acoustic Signal Denoising Algorithms: A Survey of the State-of-the-art

Figure 4 for Underwater Acoustic Signal Denoising Algorithms: A Survey of the State-of-the-art

Abstract:This paper comprehensively reviews recent advances in underwater acoustic signal denoising, an area critical for improving the reliability and clarity of underwater communication and monitoring systems. Despite significant progress in the field, the complex nature of underwater environments poses unique challenges that complicate the denoising process. We begin by outlining the fundamental challenges associated with underwater acoustic signal processing, including signal attenuation, noise variability, and the impact of environmental factors. The review then systematically categorizes and discusses various denoising algorithms, such as conventional, decomposition-based, and learning-based techniques, highlighting their applications, advantages, and limitations. Evaluation metrics and experimental datasets are also reviewed. The paper concludes with a list of open questions and recommendations for future research directions, emphasizing the need for developing more robust denoising techniques that can adapt to the dynamic underwater acoustic environment.

Via

Access Paper or Ask Questions

Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Nov 02, 2023

Heng Dong, Junyu Zhang, Chongjie Zhang

Figure 1 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 2 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 3 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 4 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Abstract:Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability.

Via

Access Paper or Ask Questions

Symmetry-Aware Robot Design with Structured Subgroups

May 31, 2023

Heng Dong, Junyu Zhang, Tonghan Wang, Chongjie Zhang

Figure 1 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 2 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 3 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 4 for Symmetry-Aware Robot Design with Structured Subgroups

Abstract:Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of the dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability.

* The Fortieth International Conference on Machine Learning (ICML 2023)

Via

Access Paper or Ask Questions

Low-Rank Modular Reinforcement Learning via Muscle Synergy

Oct 26, 2022

Heng Dong, Tonghan Wang, Jiayuan Liu, Chongjie Zhang

Figure 1 for Low-Rank Modular Reinforcement Learning via Muscle Synergy

Figure 2 for Low-Rank Modular Reinforcement Learning via Muscle Synergy

Figure 3 for Low-Rank Modular Reinforcement Learning via Muscle Synergy

Figure 4 for Low-Rank Modular Reinforcement Learning via Muscle Synergy

Abstract:Modular Reinforcement Learning (RL) decentralizes the control of multi-joint robots by learning policies for each actuator. Previous work on modular RL has proven its ability to control morphologically different agents with a shared actuator policy. However, with the increase in the Degree of Freedom (DoF) of robots, training a morphology-generalizable modular controller becomes exponentially difficult. Motivated by the way the human central nervous system controls numerous muscles, we propose a Synergy-Oriented LeARning (SOLAR) framework that exploits the redundant nature of DoF in robot control. Actuators are grouped into synergies by an unsupervised learning method, and a synergy action is learned to control multiple actuators in synchrony. In this way, we achieve a low-rank control at the synergy level. We extensively evaluate our method on a variety of robot morphologies, and the results show its superior efficiency and generalizability, especially on robots with a large DoF like Humanoids++ and UNIMALs.

* 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

Via

Access Paper or Ask Questions

Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Apr 23, 2021

Heng Dong, Tonghan Wang, Jiayuan Liu, Chongjie Zhang

Figure 1 for Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Figure 2 for Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Figure 3 for Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Figure 4 for Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Abstract:How cooperation emerges is a long-standing and interdisciplinary problem. Game-theoretical studies on social dilemmas reveal that altruistic incentives are critical to the emergence of cooperation but their analyses are limited to stateless games. For more realistic scenarios, multi-agent reinforcement learning has been used to study sequential social dilemmas (SSDs). Recent works show that learning to incentivize other agents can promote cooperation in SSDs. However, with these incentivizing mechanisms, the team cooperation level does not converge and regularly oscillates between cooperation and defection during learning. We show that a second-order social dilemma resulting from these incentive mechanisms is the main reason for such fragile cooperation. We analyze the dynamics of this second-order social dilemma and find that a typical tendency of humans, called homophily, can solve the problem. We propose a novel learning framework to encourage incentive homophily and show that it achieves stable cooperation in both public goods dilemma and tragedy of the commons dilemma.

Via

Access Paper or Ask Questions

Off-Policy Multi-Agent Decomposed Policy Gradients

Jul 24, 2020

Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

Figure 1 for Off-Policy Multi-Agent Decomposed Policy Gradients

Figure 2 for Off-Policy Multi-Agent Decomposed Policy Gradients

Figure 3 for Off-Policy Multi-Agent Decomposed Policy Gradients

Abstract:Recently, multi-agent policy gradient (MAPG) methods witness vigorous progress. However, there is a discrepancy between the performance of MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate the causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that our method significantly outperforms state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg.

Via

Access Paper or Ask Questions