Picture for Xunliang Cai

Xunliang Cai

Alphabetical order by last name

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Add code
Mar 17, 2026
Viaarxiv icon

Can RL Improve Generalization of LLM Agents? An Empirical Study

Add code
Mar 12, 2026
Viaarxiv icon

$V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts

Add code
Mar 11, 2026
Viaarxiv icon

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Add code
Mar 02, 2026
Viaarxiv icon

Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Add code
Mar 02, 2026
Viaarxiv icon

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training

Add code
Mar 02, 2026
Viaarxiv icon

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization

Add code
Feb 22, 2026
Viaarxiv icon

MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning

Add code
Feb 19, 2026
Viaarxiv icon

Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws

Add code
Feb 15, 2026
Viaarxiv icon

SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining

Add code
Feb 12, 2026
Viaarxiv icon