Picture for Aviv Bick

Aviv Bick

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

Add code
Feb 23, 2025
Viaarxiv icon

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Add code
Aug 19, 2024
Figure 1 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 2 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 3 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 4 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Viaarxiv icon