Picture for Albert Gu

Albert Gu

Machine Learning Department, Carnegie Mellon University

Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences

Add code
Mar 20, 2025
Viaarxiv icon

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

Add code
Feb 23, 2025
Viaarxiv icon

On the Benefits of Memory for Modeling Time-Dependent PDEs

Add code
Sep 03, 2024
Viaarxiv icon

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Add code
Aug 19, 2024
Figure 1 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 2 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 3 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Figure 4 for Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Viaarxiv icon

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Add code
Jul 13, 2024
Viaarxiv icon

An Empirical Study of Mamba-based Language Models

Add code
Jun 12, 2024
Figure 1 for An Empirical Study of Mamba-based Language Models
Figure 2 for An Empirical Study of Mamba-based Language Models
Figure 3 for An Empirical Study of Mamba-based Language Models
Figure 4 for An Empirical Study of Mamba-based Language Models
Viaarxiv icon

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Add code
May 31, 2024
Viaarxiv icon

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Add code
Mar 05, 2024
Figure 1 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 2 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 3 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 4 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Viaarxiv icon

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Add code
Feb 29, 2024
Viaarxiv icon