Picture for Chaojun Xiao

Chaojun Xiao

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Add code
May 11, 2026
Viaarxiv icon

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

Add code
Apr 03, 2026
Viaarxiv icon

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Add code
Feb 12, 2026
Viaarxiv icon

Data Science and Technology Towards AGI Part I: Tiered Data Management

Add code
Feb 09, 2026
Viaarxiv icon

Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention

Add code
Jan 29, 2026
Viaarxiv icon

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Add code
Jan 29, 2026
Viaarxiv icon

Revealing the Attention Floating Mechanism in Masked Diffusion Models

Add code
Jan 12, 2026
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon