Picture for Shimao Chen

Shimao Chen

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

Add code
Feb 03, 2026
Viaarxiv icon

MiMo-V2-Flash Technical Report

Add code
Jan 08, 2026
Viaarxiv icon

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

Fairy$\pm i$: the First 2-bit Complex LLM with All Parameters in $\{\pm1, \pm i\}$

Add code
Aug 07, 2025
Viaarxiv icon

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Add code
May 12, 2025
Viaarxiv icon

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

Add code
Sep 26, 2024
Figure 1 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 2 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 3 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 4 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Viaarxiv icon

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Add code
Apr 25, 2024
Viaarxiv icon