Picture for Tri Dao

Tri Dao

Marconi: Prefix Caching for the Era of Hybrid LLMs

Add code
Nov 28, 2024
Viaarxiv icon

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Add code
Aug 27, 2024
Viaarxiv icon

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Add code
Jul 13, 2024
Viaarxiv icon

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Add code
Jul 11, 2024
Viaarxiv icon

An Empirical Study of Mamba-based Language Models

Add code
Jun 12, 2024
Figure 1 for An Empirical Study of Mamba-based Language Models
Figure 2 for An Empirical Study of Mamba-based Language Models
Figure 3 for An Empirical Study of Mamba-based Language Models
Figure 4 for An Empirical Study of Mamba-based Language Models
Viaarxiv icon

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Add code
May 31, 2024
Viaarxiv icon

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Add code
Mar 05, 2024
Figure 1 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 2 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 3 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Figure 4 for Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Add code
Feb 28, 2024
Viaarxiv icon