Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shan Zhong

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

May 24, 2025

Shutong Ding, Ke Hu, Shan Zhong, Haoyang Luo, Weinan Zhang, Jingya Wang, Jun Wang, Ye Shi

Abstract:Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings. GenPO introduces a novel doubled dummy action mechanism that enables invertibility via alternating updates, resolving log-likelihood computation barriers. Furthermore, we also use the action log-likelihood for unbiased entropy and KL divergence estimation, enabling KL-adaptive learning rates and entropy regularization in on-policy updates. Extensive experiments on eight IsaacLab benchmarks, including legged locomotion (Ant, Humanoid, Anymal-D, Unitree H1, Go2), dexterous manipulation (Shadow Hand), aerial control (Quadcopter), and robotic arm tasks (Franka), demonstrate GenPO's superiority over existing RL baselines. Notably, GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.

Via

Access Paper or Ask Questions

Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Nov 09, 2024

Shan Zhong, Jiahao Zeng, Yongxin Yu, Bohong Lin

Figure 1 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 2 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 3 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 4 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Abstract:This paper introduces an innovative semi-supervised learning approach for text classification, addressing the challenge of abundant data but limited labeled examples. Our methodology integrates few-shot learning with retrieval-augmented generation (RAG) and conventional statistical clustering, enabling effective learning from a minimal number of labeled instances while generating high-quality labeled data. To the best of our knowledge, we are the first to incorporate RAG alongside clustering in text data generation. Our experiments on the Reuters and Web of Science datasets demonstrate state-of-the-art performance, with few-shot augmented data alone producing results nearly equivalent to those achieved with fully labeled datasets. Notably, accuracies of 95.41\% and 82.43\% were achieved for complex text document classification tasks, where the number of categories can exceed 100.

Via

Access Paper or Ask Questions

Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Oct 12, 2024

Jared Deighton, Shan Zhong, Kofi Agyeman, Wooseong Choi, Charles Liu, Darrin Lee, Vasileios Maroulas, Vasileios Christopoulos

Figure 1 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 2 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 3 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 4 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Abstract:Functional ultrasound imaging (fUSI) is a cutting-edge technology that measures changes in cerebral blood volume (CBV) by detecting backscattered echoes from red blood cells moving within its field of view (FOV). It offers high spatiotemporal resolution and sensitivity, allowing for detailed visualization of cerebral blood flow dynamics. While fUSI has been utilized in preclinical drug development studies to explore the mechanisms of action of various drugs targeting the central nervous system, many of these studies have primarily focused on predetermined regions of interest (ROIs). This focus may overlook relevant brain activity outside these specific areas, which could influence the results. To address this limitation, we combined convolutional neural networks (CNNs) with fUSI to comprehensively understand the pharmacokinetic process of Dizocilpine, also known as MK-801, a drug that blocks the N-Methyl-D-aspartate (NMDA) receptor in the central nervous system. CNN and class activation mapping (CAM) revealed the spatiotemporal effects of MK-801, which originated in the cortex and propagated to the hippocampus, demonstrating the ability to detect dynamic drug effects over time. Additionally, CNN and CAM assessed the impact of anesthesia on the spatiotemporal hemodynamics of the brain, revealing no distinct patterns between early and late stages. The integration of fUSI and CNN provides a powerful tool to gain insights into the spatiotemporal dynamics of drug action in the brain. This combination enables a comprehensive and unbiased assessment of drug effects on brain function, potentially accelerating the development of new therapies in neuropharmacological studies.

Via

Access Paper or Ask Questions

Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Jul 16, 2024

Dimitrios Chatziparaschis, Shan Zhong, Vasileios Christopoulos, Konstantinos Karydis

Figure 1 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 2 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 3 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 4 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Abstract:Bio-inspired robotic systems are capable of adaptive learning, scalable control, and efficient information processing. Enabling real-time decision-making for such systems is critical to respond to dynamic changes in the environment. We focus on dynamic target tracking in open areas using a robotic six-degree-of-freedom manipulator with a bird-eye view camera for visual feedback, and by deploying the Neurodynamical Computational Framework (NeuCF). NeuCF is a recently developed bio-inspired model for target tracking based on Dynamic Neural Fields (DNFs) and Stochastic Optimal Control (SOC) theory. It has been trained for reaching actions on a planar surface toward localized visual beacons, and it can re-target or generate stop signals on the fly based on changes in the environment (e.g., a new target has emerged, or an existing one has been removed). We evaluated our system over various target-reaching scenarios. In all experiments, NeuCF had high end-effector positional accuracy, generated smooth trajectories, and provided reduced path lengths compared with a baseline cubic polynomial trajectory generator. In all, the developed system offers a robust and dynamic-aware robotic manipulation approach that affords real-time decision-making.

* 6 pages, 6 figures, conference

Via

Access Paper or Ask Questions

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Dec 29, 2023

Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Meikang Qiu

Figure 1 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 2 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 3 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 4 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Abstract:The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

* 20 pages, 1 figure, 22 tables

Via

Access Paper or Ask Questions

Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Sep 22, 2023

Zuxuan Zhang, Gang Wang, Jiacheng He, Shan Zhong

Figure 1 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 2 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 3 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 4 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Abstract:The estimation of non-Gaussian measurement noise models is a significant challenge across various fields. In practical applications, it often faces challenges due to the large number of parameters and high computational complexity. This paper proposes a threshold-based Kalman filtering approach for online estimation of noise parameters in non-Gaussian measurement noise models. This method uses a certain amount of sample data to infer the variance threshold of observation parameters and employs variational Bayesian estimation to obtain corresponding noise variance estimates, enabling subsequent iterations of the Kalman filtering algorithm. Finally, we evaluate the performance of this algorithm through simulation experiments, demonstrating its accurate and effective estimation of state and noise parameters.

* 5 pages, conference

Via

Access Paper or Ask Questions

Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Aug 03, 2023

Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan

Abstract:Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box. It is one of the key features of modern search engines specially in e-commerce. One of the goals of typeahead is to suggest relevant queries to users which are seasonally important. In this paper we propose a neural network based natural language processing (NLP) algorithm to incorporate seasonality as a signal and present end to end evaluation of the QAC ranking model. Incorporating seasonality into autocomplete ranking model can improve autocomplete relevance and business metric.

* Accepted at The 6th Workshop on e-Commerce and NLP (ECNLP 6), KDD'23, Long Beach, CA

Via

Access Paper or Ask Questions

Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Jul 04, 2023

Jiacheng He, Gang Wang, Kun Zhang, Shan Zhong, Bei Peng, Min Li

Figure 1 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 2 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 3 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 4 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Abstract:The robustness of the kernel recursive least square (KRLS) algorithm has recently been improved by combining them with more robust information-theoretic learning criteria, such as minimum error entropy (MEE) and generalized MEE (GMEE), which also improves the computational complexity of the KRLS-type algorithms to a certain extent. To reduce the computational load of the KRLS-type algorithms, the quantized GMEE (QGMEE) criterion, in this paper, is combined with the KRLS algorithm, and as a result two kinds of KRLS-type algorithms, called quantized kernel recursive MEE (QKRMEE) and quantized kernel recursive GMEE (QKRGMEE), are designed. As well, the mean error behavior, mean square error behavior, and computational complexity of the proposed algorithms are investigated. In addition, simulation and real experimental data are utilized to verify the feasibility of the proposed algorithms.

Via

Access Paper or Ask Questions

Minimum Error Entropy Rauch-Tung-Striebel Smoother

Jan 14, 2023

Jiacheng He, Hongwei Wang, Gang Wang, Shan Zhong, Bei Peng

Abstract:Outliers and impulsive disturbances often cause heavy-tailed distributions in practical applications, and these will degrade the performance of Gaussian approximation smoothing algorithms. To improve the robustness of the Rauch-Tung-Striebel (RTS) smother against complicated non-Gaussian noises, a new RTS-smoother integrated with the minimum error entropy (MEE) criterion (MEE-RTS) is proposed for linear systems, which is also extended to the state estimation of nonlinear systems by utilizing the Taylor series linearization approach. The mean error behavior, the mean square error behavior, as well as the computational complexity of the MEE-RTS smoother are analyzed. According to simulation results, the proposed smoothers perform better than several robust solutions in terms of steady-state error.

Via

Access Paper or Ask Questions

S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Aug 24, 2021

Shan Zhong, David B. Hitchcock

Figure 1 for S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Figure 2 for S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Figure 3 for S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Figure 4 for S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Abstract:We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions