Picture for Chaojun Xiao

Chaojun Xiao

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Add code
Jan 29, 2026
Viaarxiv icon

Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention

Add code
Jan 29, 2026
Viaarxiv icon

Revealing the Attention Floating Mechanism in Masked Diffusion Models

Add code
Jan 12, 2026
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon

Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data

Add code
May 08, 2025
Figure 1 for Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Figure 2 for Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Figure 3 for Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Figure 4 for Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Viaarxiv icon

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

Add code
Feb 17, 2025
Viaarxiv icon

Densing Law of LLMs

Add code
Dec 05, 2024
Figure 1 for Densing Law of LLMs
Figure 2 for Densing Law of LLMs
Figure 3 for Densing Law of LLMs
Figure 4 for Densing Law of LLMs
Viaarxiv icon

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

Add code
Nov 04, 2024
Figure 1 for Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Figure 2 for Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Figure 3 for Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Figure 4 for Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Viaarxiv icon

Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs

Add code
Oct 09, 2024
Figure 1 for Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
Figure 2 for Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
Figure 3 for Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
Figure 4 for Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
Viaarxiv icon