Picture for Junxiong Wang

Junxiong Wang

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention

Add code
Mar 18, 2026
Viaarxiv icon

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Add code
Mar 04, 2026
Viaarxiv icon

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

Add code
Feb 06, 2026
Viaarxiv icon

Distilling Token-Trained Models into Byte-Level Models

Add code
Feb 01, 2026
Viaarxiv icon

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Add code
Nov 17, 2025
Viaarxiv icon

OverFill: Two-Stage Models for Efficient Language Model Decoding

Add code
Aug 11, 2025
Viaarxiv icon

Fairness Practices in Industry: A Case Study in Machine Learning Teams Building Recommender Systems

Add code
May 26, 2025
Viaarxiv icon

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Add code
Apr 14, 2025
Viaarxiv icon

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Add code
Aug 27, 2024
Viaarxiv icon