Picture for Kevin Y. Li

Kevin Y. Li

Mamba-3: Improved Sequence Modeling using State Space Principles

Add code
Mar 16, 2026
Viaarxiv icon

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Inference Optimal VLMs Need Only One Visual Token but Larger Models

Add code
Nov 05, 2024
Viaarxiv icon

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Add code
Aug 19, 2024
Figure 1 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 2 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 3 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 4 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Viaarxiv icon