Picture for Chaojun Xiao

Chaojun Xiao

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Add code
Jun 17, 2026
Viaarxiv icon

Rethinking the Role of Efficient Attention in Hybrid Architectures

Add code
Jun 13, 2026
Viaarxiv icon

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Add code
May 11, 2026
Viaarxiv icon

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

Add code
Apr 03, 2026
Viaarxiv icon

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Add code
Feb 12, 2026
Viaarxiv icon

Data Science and Technology Towards AGI Part I: Tiered Data Management

Add code
Feb 09, 2026
Viaarxiv icon

Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention

Add code
Jan 29, 2026
Viaarxiv icon

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Add code
Jan 29, 2026
Viaarxiv icon

Revealing the Attention Floating Mechanism in Masked Diffusion Models

Add code
Jan 12, 2026
Viaarxiv icon