Picture for Kevin Y. Li

Kevin Y. Li

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Inference Optimal VLMs Need Only One Visual Token but Larger Models

Add code
Nov 05, 2024
Viaarxiv icon

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Add code
Aug 19, 2024
Figure 1 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 2 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 3 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 4 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Viaarxiv icon