Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuhua Zhu

A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics

Apr 02, 2025

Qihao Ye, Xiaochuan Tian, Yuhua Zhu

Abstract:This paper develops a model-based framework for continuous-time policy evaluation (CTPE) in reinforcement learning, incorporating both Brownian and L\'evy noise to model stochastic dynamics influenced by rare and extreme events. Our approach formulates the policy evaluation problem as solving a partial integro-differential equation (PIDE) for the value function with unknown coefficients. A key challenge in this setting is accurately recovering the unknown coefficients in the stochastic dynamics, particularly when driven by L\'evy processes with heavy tail effects. To address this, we propose a robust numerical approach that effectively handles both unbiased and censored trajectory datasets. This method combines maximum likelihood estimation with an iterative tail correction mechanism, improving the stability and accuracy of coefficient recovery. Additionally, we establish a theoretical bound for the policy evaluation error based on coefficient recovery error. Through numerical experiments, we demonstrate the effectiveness and robustness of our method in recovering heavy-tailed L\'evy dynamics and verify the theoretical error analysis in policy evaluation.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions

PsyPlay: Personality-Infused Role-Playing Conversational Agents

Feb 06, 2025

Tao Yang, Yuhua Zhu, Xiaojun Quan, Cong Liu, Qifan Wang

Abstract:The current research on Role-Playing Conversational Agents (RPCAs) with Large Language Models (LLMs) primarily focuses on imitating specific speaking styles and utilizing character backgrounds, neglecting the depiction of deeper personality traits.~In this study, we introduce personality-infused role-playing for LLM agents, which encourages agents to accurately portray their designated personality traits during dialogues. We then propose PsyPlay, a dialogue generation framework that facilitates the expression of rich personalities among multiple LLM agents. Specifically, PsyPlay enables agents to assume roles with distinct personality traits and engage in discussions centered around specific topics, consistently exhibiting their designated personality traits throughout the interactions. Validation on generated dialogue data demonstrates that PsyPlay can accurately portray the intended personality traits, achieving an overall success rate of 80.31% on GPT-3.5. Notably, we observe that LLMs aligned with positive values are more successful in portraying positive personality roles compared to negative ones. Moreover, we construct a dialogue corpus for personality-infused role-playing, called PsyPlay-Bench. The corpus, which consists of 4745 instances of correctly portrayed dialogues using PsyPlay, aims to further facilitate research in personalized role-playing and dialogue personality detection.

Via

Access Paper or Ask Questions

Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization

Dec 03, 2024

Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu

Abstract:Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.

Via

Access Paper or Ask Questions

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

Jul 08, 2024

Wenlong Mou, Yuhua Zhu

Abstract:We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.

* WM and YZ contributed equally to this work

Via

Access Paper or Ask Questions

FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

May 04, 2023

Jose A. Carrillo, Nicolas Garcia Trillos, Sixu Li, Yuhua Zhu

Figure 1 for FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

Figure 2 for FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

Figure 3 for FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

Figure 4 for FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

Abstract:Federated learning is an important framework in modern machine learning that seeks to integrate the training of learning models from multiple users, each user having their own local data set, in a way that is sensitive to data privacy and to communication loss constraints. In clustered federated learning, one assumes an additional unknown group structure among users, and the goal is to train models that are useful for each group, rather than simply training a single global model for all users. In this paper, we propose a novel solution to the problem of clustered federated learning that is inspired by ideas in consensus-based optimization (CBO). Our new CBO-type method is based on a system of interacting particles that is oblivious to group memberships. Our model is motivated by rigorous mathematical reasoning, including a mean field analysis describing the large number of particles limit of our particle system, as well as convergence guarantees for the simultaneous global optimization of general non-convex objective functions (corresponding to the loss functions of each cluster of users) in the mean-field regime. Experimental results demonstrate the efficacy of our FedCBO algorithm compared to other state-of-the-art methods and help validate our methodological and theoretical work.

Via

Access Paper or Ask Questions

Continuous-in-time Limit for Bayesian Bandits

Oct 14, 2022

Yuhua Zhu, Zach Izzo, Lexing Ying

Figure 1 for Continuous-in-time Limit for Bayesian Bandits

Figure 2 for Continuous-in-time Limit for Bayesian Bandits

Figure 3 for Continuous-in-time Limit for Bayesian Bandits

Figure 4 for Continuous-in-time Limit for Bayesian Bandits

Abstract:This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges to a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit problems with large horizons. Our method has the added benefit that its computational cost does not increase as the horizon increases.

Via

Access Paper or Ask Questions

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Dec 02, 2021

Xiaowu Dai, Yuhua Zhu

Figure 1 for On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Figure 2 for On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Figure 3 for On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Figure 4 for On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Abstract:We study the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continuous formulation of SDE and the theory of Fokker-Planck equations to develop new results on the escaping phenomenon and the relationship with large batch and sharp minima. In particular, we find that the stochastic process solution tends to converge to flatter minima regardless of the batch size in the asymptotic regime. However, the convergence rate is rigorously proven to depend on the batch size. These results are validated empirically with various datasets and models.

Via

Access Paper or Ask Questions

Operator Augmentation for Model-based Policy Evaluation

Oct 25, 2021

Xun Tang, Lexing Ying, Yuhua Zhu

Figure 1 for Operator Augmentation for Model-based Policy Evaluation

Figure 2 for Operator Augmentation for Model-based Policy Evaluation

Figure 3 for Operator Augmentation for Model-based Policy Evaluation

Figure 4 for Operator Augmentation for Model-based Policy Evaluation

Abstract:In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator augmentation method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the augmentation factor is always positive and upper bounded by $1 + O (1/n)$, where n is the number of samples used in learning each row of the transition matrix. We also propose a practical numerical algorithm for implementing the operator augmentation.

Via

Access Paper or Ask Questions

Variational Actor-Critic Algorithms

Aug 15, 2021

Yuhua Zhu, Lexing Ying

Figure 1 for Variational Actor-Critic Algorithms

Figure 2 for Variational Actor-Critic Algorithms

Figure 3 for Variational Actor-Critic Algorithms

Figure 4 for Variational Actor-Critic Algorithms

Abstract:We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clipping method and the flipping method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.

Via

Access Paper or Ask Questions

Why resampling outperforms reweighting for correcting sampling bias

Sep 28, 2020

Jing An, Lexing Ying, Yuhua Zhu

Figure 1 for Why resampling outperforms reweighting for correcting sampling bias

Figure 2 for Why resampling outperforms reweighting for correcting sampling bias

Figure 3 for Why resampling outperforms reweighting for correcting sampling bias

Figure 4 for Why resampling outperforms reweighting for correcting sampling bias

Abstract:A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for potential biases. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that reweighting outperforms resampling when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias.

Via

Access Paper or Ask Questions