Picture for Yanghua Xiao

Yanghua Xiao

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Add code
Mar 10, 2026
Viaarxiv icon

CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning

Add code
Mar 09, 2026
Viaarxiv icon

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

Add code
Jan 29, 2026
Viaarxiv icon

HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns

Add code
Jan 15, 2026
Viaarxiv icon

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

Add code
Jan 12, 2026
Viaarxiv icon

Structured Reasoning for Large Language Models

Add code
Jan 12, 2026
Viaarxiv icon

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Add code
Jan 10, 2026
Viaarxiv icon

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

Add code
Jan 08, 2026
Viaarxiv icon

Why Did Apple Fall To The Ground: Evaluating Curiosity In Large Language Model

Add code
Oct 23, 2025
Viaarxiv icon

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Add code
Oct 16, 2025
Viaarxiv icon