Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhihua Zhang

Last Iterate Analyses of FTRL in Stochasitc Bandits

Oct 26, 2025

Jingxin Zhan, Yuze Han, Zhihua Zhang

Abstract:The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function $\Psi(p)=-4\sum_{i=1}^{d}\sqrt{p_i}$ associated with the BOBW FTRL algorithm $1/2$-Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.

Via

Access Paper or Ask Questions

AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

Apr 15, 2025

Pu Wang, Zhihua Zhang, Dianjie Lu, Guijuan Zhang, Youshan Zhang, Zhuoran Zheng

Figure 1 for AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

Figure 2 for AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

Figure 3 for AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

Figure 4 for AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

Abstract:Since human and environmental factors interfere, captured polyp images usually suffer from issues such as dim lighting, blur, and overexposure, which pose challenges for downstream polyp segmentation tasks. To address the challenges of noise-induced degradation in polyp images, we present AgentPolyp, a novel framework integrating CLIP-based semantic guidance and dynamic image enhancement with a lightweight neural network for segmentation. The agent first evaluates image quality using CLIP-driven semantic analysis (e.g., identifying ``low-contrast polyps with vascular textures") and adapts reinforcement learning strategies to dynamically apply multi-modal enhancement operations (e.g., denoising, contrast adjustment). A quality assessment feedback loop optimizes pixel-level enhancement and segmentation focus in a collaborative manner, ensuring robust preprocessing before neural network segmentation. This modular architecture supports plug-and-play extensions for various enhancement algorithms and segmentation networks, meeting deployment requirements for endoscopic devices.

Via

Access Paper or Ask Questions

Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Apr 09, 2025

Jingxin Zhan, Zhihua Zhang

Figure 1 for Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Abstract:We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy, which, however, requires to explicitly compute the arm-selection probabilities by solving optimizing problems at each time step and sample according to it. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr\'echet perturbation also enjoys the optimal regret bound $\mathcal{O}(\sqrt{nmd})$ in the adversarial setting and achieves best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting.

Via

Access Paper or Ask Questions

PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

Feb 26, 2025

Pu Wang, Huaizhi Ma, Zhihua Zhang, Zhuoran Zheng

Figure 1 for PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

Figure 2 for PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

Figure 3 for PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

Figure 4 for PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

Abstract:Accurate polyp segmentation remains challenging due to irregular lesion morphologies, ambiguous boundaries, and heterogeneous imaging conditions. While U-Net variants excel at local feature fusion, they often lack explicit mechanisms to model the dynamic evolution of segmentation confidence under uncertainty. Inspired by the interpretable nature of flow-based models, we present \textbf{PolypFLow}, a flow-matching enhanced architecture that injects physics-inspired optimization dynamics into segmentation refinement. Unlike conventional cascaded networks, our framework solves an ordinary differential equation (ODE) to progressively align coarse initial predictions with ground truth masks through learned velocity fields. This trajectory-based refinement offers two key advantages: 1) Interpretable Optimization: Intermediate flow steps visualize how the model corrects under-segmented regions and sharpens boundaries at each ODE-solver iteration, demystifying the ``black-box" refinement process; 2) Boundary-Aware Robustness: The flow dynamics explicitly model gradient directions along polyp edges, enhancing resilience to low-contrast regions and motion artifacts. Numerous experimental results show that PolypFLow achieves a state-of-the-art while maintaining consistent performance in different lighting scenarios.

Via

Access Paper or Ask Questions

Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Feb 20, 2025

Yang Peng, Kaicheng Jin, Liangyu Zhang, Zhihua Zhang

Figure 1 for Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Figure 2 for Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Figure 3 for Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Figure 4 for Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Abstract:In this paper, we investigate the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy {\pi}. Prior works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of the classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return using streaming data is no more difficult than learning its expectation (i.e. the value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation, and employ the exponential stability arguments for products of random matrices. Our findings provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

* 57 pages

Via

Access Paper or Ask Questions

A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise

Jan 19, 2025

Jingxin Zhan, Yuchen Xin, Kaicheng Jin, Zhihua Zhang

Figure 1 for A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise

Abstract:We study a stochastic convex bandit problem where the subgaussian noise parameter is assumed to decrease linearly as the learner selects actions closer and closer to the minimizer of the convex loss function. Accordingly, we propose a Regularized Online Newton Method (RONM) for solving the problem, based on the Online Newton Method (ONM) of arXiv:2406.06506. Our RONM reaches a polylogarithmic regret in the time horizon $n$ when the loss function grows quadratically in the constraint set, which recovers the results of arXiv:2402.12042 in linear bandits. Our analyses rely on the growth rate of the precision matrix $\Sigma_t^{-1}$ in ONM and we find that linear growth solves the question exactly. These analyses also help us obtain better convergence rates when the loss function grows faster. We also study and analyze two new bandit models: stochastic convex bandits with noise scaled to a subgaussian parameter function and convex bandits with stochastic multiplicative noise.

Via

Access Paper or Ask Questions

Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

Dec 22, 2024

Yuze Han, Xiang Li, Jiadong Liang, Zhihua Zhang

Figure 1 for Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

Figure 2 for Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

Figure 3 for Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

Abstract:In two-time-scale stochastic approximation (SA), two iterates are updated at different rates, governed by distinct step sizes, with each update influencing the other. Previous studies have demonstrated that the convergence rates of the error terms for these updates depend solely on their respective step sizes, a property known as decoupled convergence. However, a functional version of this decoupled convergence has not been explored. Our work fills this gap by establishing decoupled functional central limit theorems for two-time-scale SA, offering a more precise characterization of its asymptotic behavior. To achieve these results, we leverage the martingale problem approach and establish tightness as a crucial intermediate step. Furthermore, to address the interdependence between different time scales, we introduce an innovative auxiliary sequence to eliminate the primary influence of the fast-time-scale update on the slow-time-scale update.

Via

Access Paper or Ask Questions

Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach

Nov 21, 2024

Xian-Xian Liu, Mingkun Xu, Yuanyuan Wei, Huafeng Qin, Qun Song, Simon Fong, Feng Tien, Wei Luo, Juntao Gao, Zhihua Zhang(+1 more)

Figure 1 for Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach

Figure 2 for Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach

Figure 3 for Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach

Abstract:Timely and precise classification and segmentation of gastric bleeding in endoscopic imagery are pivotal for the rapid diagnosis and intervention of gastric complications, which is critical in life-saving medical procedures. Traditional methods grapple with the challenge posed by the indistinguishable intensity values of bleeding tissues adjacent to other gastric structures. Our study seeks to revolutionize this domain by introducing a novel deep learning model, the Dual Spatial Kernelized Constrained Fuzzy C-Means (Deep DuS-KFCM) clustering algorithm. This Hybrid Neuro-Fuzzy system synergizes Neural Networks with Fuzzy Logic to offer a highly precise and efficient identification of bleeding regions. Implementing a two-fold coarse-to-fine strategy for segmentation, this model initially employs the Spatial Kernelized Fuzzy C-Means (SKFCM) algorithm enhanced with spatial intensity profiles and subsequently harnesses the state-of-the-art DeepLabv3+ with ResNet50 architecture to refine the segmentation output. Through extensive experiments across mainstream gastric bleeding and red spots datasets, our Deep DuS-KFCM model demonstrated unprecedented accuracy rates of 87.95%, coupled with a specificity of 96.33%, outperforming contemporary segmentation methods. The findings underscore the model's robustness against noise and its outstanding segmentation capabilities, particularly for identifying subtle bleeding symptoms, thereby presenting a significant leap forward in medical image processing.

Via

Access Paper or Ask Questions

Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Oct 19, 2024

Chuhan Xie, Kaicheng Jin, Jiadong Liang, Zhihua Zhang

Figure 1 for Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Figure 2 for Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Figure 3 for Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Figure 4 for Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Abstract:We study time-uniform statistical inference for parameters in stochastic approximation (SA), which encompasses a bunch of applications in optimization and machine learning. To that end, we analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems. We then construct three types of asymptotic confidence sequences that are valid uniformly across all times with coverage guarantees, in an asymptotic sense that the starting time is sufficiently large. These coverage guarantees remain valid if the unknown covariance matrix is replaced by its plug-in estimator, and we conduct experiments to validate our methodology.

* 35 pages, 4 figures

Via

Access Paper or Ask Questions

Federated Control in Markov Decision Processes

May 07, 2024

Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

Figure 1 for Federated Control in Markov Decision Processes

Figure 2 for Federated Control in Markov Decision Processes

Figure 3 for Federated Control in Markov Decision Processes

Figure 4 for Federated Control in Markov Decision Processes

Abstract:We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

Via

Access Paper or Ask Questions