Picture for Atri Rudra

Atri Rudra

Just read twice: closing the recall gap for recurrent language models

Add code
Jul 07, 2024
Viaarxiv icon

Simple linear attention language models balance the recall-throughput tradeoff

Add code
Feb 28, 2024
Viaarxiv icon

Zoology: Measuring and Improving Recall in Efficient Language Models

Add code
Dec 08, 2023
Viaarxiv icon

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Add code
Oct 28, 2023
Figure 1 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Figure 2 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Figure 3 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Figure 4 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Viaarxiv icon

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Add code
Oct 18, 2023
Viaarxiv icon

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Add code
Feb 13, 2023
Viaarxiv icon

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Add code
Dec 28, 2022
Viaarxiv icon

Arithmetic Circuits, Structured Matrices and (not so) Deep Learning

Add code
Jun 24, 2022
Figure 1 for Arithmetic Circuits, Structured Matrices and (not so) Deep Learning
Figure 2 for Arithmetic Circuits, Structured Matrices and (not so) Deep Learning
Figure 3 for Arithmetic Circuits, Structured Matrices and (not so) Deep Learning
Figure 4 for Arithmetic Circuits, Structured Matrices and (not so) Deep Learning
Viaarxiv icon

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

Add code
Jun 24, 2022
Figure 1 for How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Figure 2 for How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Figure 3 for How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Figure 4 for How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Viaarxiv icon

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Add code
May 27, 2022
Figure 1 for FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Figure 2 for FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Figure 3 for FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Figure 4 for FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Viaarxiv icon