Picture for Xiangxin Zhou

Xiangxin Zhou

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Add code
Jun 09, 2026
Viaarxiv icon

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Add code
Jun 09, 2026
Viaarxiv icon

Exploring the Design Space of Reward Backpropagation for Flow Matching

Add code
Jun 09, 2026
Viaarxiv icon

Rethinking the Divergence Regularization in LLM RL

Add code
Jun 08, 2026
Viaarxiv icon

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Add code
May 28, 2026
Viaarxiv icon

h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network

Add code
Apr 25, 2026
Viaarxiv icon

Rethinking the Trust Region in LLM Reinforcement Learning

Add code
Feb 04, 2026
Viaarxiv icon

Defeating the Training-Inference Mismatch via FP16

Add code
Oct 30, 2025
Viaarxiv icon

Variational Reasoning for Language Models

Add code
Sep 26, 2025
Figure 1 for Variational Reasoning for Language Models
Figure 2 for Variational Reasoning for Language Models
Figure 3 for Variational Reasoning for Language Models
Figure 4 for Variational Reasoning for Language Models
Viaarxiv icon

ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Add code
Sep 10, 2025
Viaarxiv icon