Picture for Ali Behrouz

Ali Behrouz

Nested Learning: The Illusion of Deep Learning Architectures

Add code
Dec 31, 2025
Viaarxiv icon

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Add code
Dec 29, 2025
Viaarxiv icon

Trellis: Learning to Compress Key-Value Memory in Attention Models

Add code
Dec 29, 2025
Viaarxiv icon

TNT: Improving Chunkwise Training for Test-Time Memorization

Add code
Nov 10, 2025
Viaarxiv icon

ATLAS: Learning to Optimally Memorize the Context at Test Time

Add code
May 29, 2025
Figure 1 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 2 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 3 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 4 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Viaarxiv icon

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Add code
Apr 17, 2025
Viaarxiv icon

Titans: Learning to Memorize at Test Time

Add code
Dec 31, 2024
Viaarxiv icon

Best of Both Worlds: Advantages of Hybrid Graph Sequence Models

Add code
Nov 23, 2024
Viaarxiv icon

Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Add code
Jun 06, 2024
Figure 1 for Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Figure 2 for Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Figure 3 for Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Figure 4 for Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Viaarxiv icon

MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Add code
Mar 29, 2024
Viaarxiv icon