Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Henglin Pu

Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning

Sep 11, 2025

Xuefeng Wang, Lei Zhang, Henglin Pu, Ahmed H. Qureshi, Husheng Li

Abstract:Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with differential value functions defined as viscosity solutions of the Hamilton--Jacobi--Bellman (HJB) equation. While CTRL has shown promise, its applications have been largely limited to the single-agent domain. This limitation stems from two key challenges: (i) conventional solution methods for HJB equations suffer from the curse of dimensionality (CoD), making them intractable in high-dimensional systems; and (ii) even with HJB-based learning approaches, accurately approximating centralized value functions in multi-agent settings remains difficult, which in turn destabilizes policy training. In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. To ensure the value is consistent with its differential structure, we align value learning with value-gradient learning by introducing a Value Gradient Iteration (VGI) module that iteratively refines value gradients along trajectories. This improves gradient fidelity, in turn yielding more accurate values and stronger policy learning. We evaluate our method using continuous-time variants of standard benchmarks, including multi-agent particle environment (MPE) and multi-agent MuJoCo. Our results demonstrate that our approach consistently outperforms existing continuous-time RL baselines and scales to complex multi-agent dynamics.

* 19 pages, 10 figures

Via

Access Paper or Ask Questions

Trellis Waveform Shaping for Sidelobe Reduction in Integrated Sensing and Communications: A Duality with PAPR Mitigation

May 14, 2025

Henglin Pu, Husheng Li, Zhu Han, H. Vincent Poor

Abstract:A key challenge in integrated sensing and communications (ISAC) is the synthesis of waveforms that can modulate communication messages and achieve good sensing performance simultaneously. In ISAC systems, standard communication waveforms can be adapted for sensing, as the sensing receiver (co-located with the transmitter) has knowledge of the communication message and consequently the waveform. However, the randomness of communications may result in waveforms that have high sidelobes masking weak targets. Thus, it is desirable to refine communication waveforms to improve the sensing performance by reducing the integrated sidelobe levels (ISL). This is similar to the peak-to-average power ratio (PAPR) mitigation in orthogonal frequency division multiplexing (OFDM), in which the OFDM-modulated waveform needs to be refined to reduce the PAPR. In this paper, inspired by PAPR reduction algorithms in OFDM, we employ trellis shaping in OFDM-based ISAC systems to refine waveforms for specific sensing metrics using convolutional codes and Viterbi decoding. In such a scheme, the communication data is encoded and then mapped to the signaling constellation in different subcarriers, such that the time-domain sidelobes are reduced. An interesting observation is that sidelobe reduction in OFDM-based ISAC is dual to PAPR reduction in OFDM, thereby sharing a similar signaling structure. Numerical simulations and hardware software defined radio USRP experiments are carried out to demonstrate the effectiveness of the proposed trellis shaping approach.

Via

Access Paper or Ask Questions

OTFS-ISAC System with Sub-Nyquist ADC Sampling Rate

Feb 07, 2025

Henglin Pu, Xuefeng Wang, Ajay Kumar, Lu Su, Husheng Li

Figure 1 for OTFS-ISAC System with Sub-Nyquist ADC Sampling Rate

Figure 2 for OTFS-ISAC System with Sub-Nyquist ADC Sampling Rate

Figure 3 for OTFS-ISAC System with Sub-Nyquist ADC Sampling Rate

Figure 4 for OTFS-ISAC System with Sub-Nyquist ADC Sampling Rate

Abstract:Integrated sensing and communication (ISAC) has emerged as a pivotal technology for next-generation wireless communication and radar systems, enabling high-resolution sensing and high-throughput communication with shared spectrum and hardware. However, achieving a fine radar resolution often requires high-rate analog-to-digital converters (ADCs) and substantial storage, making it both expensive and impractical for many commercial applications. To address these challenges, this paper proposes an orthogonal time frequency space (OTFS)-based ISAC architecture that operates at reduced ADC sampling rates, yet preserves accurate radar estimation and supports simultaneous communication. The proposed architecture introduces pilot symbols directly in the delay-Doppler (DD) domain to leverage the transformation mapping between the DD and time-frequency (TF) domains to keep selected subcarriers active while others are inactive, allowing the radar receiver to exploit under-sampling aliasing and recover the original DD signal at much lower sampling rates. To further enhance the radar accuracy, we develop an iterative interference estimation and cancellation algorithm that mitigates data symbol interference. We propose a code-based spreading technique that distributes data across the DD domain to preserve the maximum unambiguous radar sensing range. For communication, we implement a complete transceiver pipeline optimized for reduced sampling rate system, including synchronization, channel estimation, and iterative data detection. Experimental results from a software-defined radio (SDR)-based testbed confirm that our method substantially lowers the required sampling rate without sacrificing radar sensing performance and ensures reliable communication.

Via

Access Paper or Ask Questions

DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Mar 12, 2024

Xuefeng Wang, Henglin Pu, Hyung Jun Kim, Husheng Li

Figure 1 for DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Figure 2 for DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Figure 3 for DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Figure 4 for DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Abstract:Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have been only very limited applications of Model Predictive Control (MPC) methods in this domain, primarily due to the complex and implicit dynamics characteristic of multi-agent environments. To bridge this gap, we propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC). The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. Our method applies MARL principles to search for optimal solutions. Through the employment of MPC, the actions of agents can be restricted within safe states concurrently. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment, showcasing significant advancements in addressing safety concerns in MARL.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Investigating Skin Temperature-Based Overheating in mmWave Smartphones Power and Thermal Models for Optimal Non-Throttling Performance

Mar 26, 2023

Henglin Pu, Xingqi Wu

Figure 1 for Investigating Skin Temperature-Based Overheating in mmWave Smartphones Power and Thermal Models for Optimal Non-Throttling Performance

Figure 2 for Investigating Skin Temperature-Based Overheating in mmWave Smartphones Power and Thermal Models for Optimal Non-Throttling Performance

Figure 3 for Investigating Skin Temperature-Based Overheating in mmWave Smartphones Power and Thermal Models for Optimal Non-Throttling Performance

Figure 4 for Investigating Skin Temperature-Based Overheating in mmWave Smartphones Power and Thermal Models for Optimal Non-Throttling Performance

Abstract:5G mmWave, as a revolutionary cellular technology, holds monumental potential for innovations in many academic and industrial areas. However, widespread adoption of this technology is hindered by the severe overheating issues experienced by current Commercial Off-The-Shelf (COTS) mmWave smartphones. This study aims to identify the root causes of device skin temperature related throttling during 5G transmission, and to quantify power reduction required to prevent such throttling in a given ambient temperature. The key insight of our paper is leveraging the power model and thermal model of mmWave smartphone to acquire the quantitative relationship among power consumption, ambient temperature and device skin temperature. This approach allows us to determine the extent of power reduction required to prevent throttling under specific ambient temperature conditions.

Via

Access Paper or Ask Questions