Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MatMamba: A Matryoshka State Space Model

Oct 09, 2024

Abhinav Shukla, Sai Vemprala, Aditya Kusupati, Ashish Kapoor

Figure 1 for MatMamba: A Matryoshka State Space Model

Figure 2 for MatMamba: A Matryoshka State Space Model

Figure 3 for MatMamba: A Matryoshka State Space Model

Figure 4 for MatMamba: A Matryoshka State Space Model

Share this with someone who'll enjoy it:

Abstract:State Space Models (SSMs) like Mamba2 are a promising alternative to Transformers, with faster theoretical training and inference times -- especially for long context lengths. Recent work on Matryoshka Representation Learning -- and its application to Transformer backbones in works like MatFormer -- showed how to introduce nested granularities of smaller submodels in one universal elastic model. In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference. MatMamba allows for efficient and adaptive deployment across various model sizes. We train a single large MatMamba model and are able to get a number of smaller nested models for free -- while maintaining or improving upon the performance of a baseline smaller model trained from scratch. We train language and image models at a variety of parameter sizes from 35M to 1.4B. Our results on ImageNet and FineWeb show that MatMamba models scale comparably to Transformers, while having more efficient inference characteristics. This makes MatMamba a practically viable option for deploying large-scale models in an elastic way based on the available inference compute. Code and models are open sourced at \url{https://github.com/ScaledFoundations/MatMamba}

* 10 pages, 7 figures

View paper on

Share this with someone who'll enjoy it:

Title:MatMamba: A Matryoshka State Space Model

Paper and Code