Picture for Chaojun Xiao

Chaojun Xiao

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Add code
May 11, 2026
Viaarxiv icon

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

Add code
Apr 03, 2026
Viaarxiv icon

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Add code
Feb 12, 2026
Viaarxiv icon

Data Science and Technology Towards AGI Part I: Tiered Data Management

Add code
Feb 09, 2026
Viaarxiv icon

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Add code
Jan 29, 2026
Viaarxiv icon

Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention

Add code
Jan 29, 2026
Viaarxiv icon

Revealing the Attention Floating Mechanism in Masked Diffusion Models

Add code
Jan 12, 2026
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon