Picture for Xunliang Cai

Xunliang Cai

Alphabetical order by last name

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

Add code
Jun 11, 2026
Viaarxiv icon

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

Add code
Jun 11, 2026
Viaarxiv icon

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

Add code
Jun 09, 2026
Viaarxiv icon

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

Add code
Jun 09, 2026
Viaarxiv icon

Asuka-Bench: Benchmarking Code Agents on Underspecified User Intent and Multi-Round Refinement

Add code
Jun 04, 2026
Viaarxiv icon

SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems

Add code
Jun 02, 2026
Viaarxiv icon

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Add code
Jun 01, 2026
Viaarxiv icon

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Add code
May 29, 2026
Viaarxiv icon

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

Add code
May 27, 2026
Viaarxiv icon

ATLAS: All-round Testing of Long-context Abilities across Scales

Add code
May 27, 2026
Viaarxiv icon