Picture for Alexander M. Rush

Alexander M. Rush

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Add code
Apr 14, 2025
Viaarxiv icon

Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?

Add code
Apr 02, 2025
Viaarxiv icon

NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics

Add code
Dec 30, 2024
Viaarxiv icon

Compute-Constrained Data Selection

Add code
Oct 21, 2024
Figure 1 for Compute-Constrained Data Selection
Figure 2 for Compute-Constrained Data Selection
Figure 3 for Compute-Constrained Data Selection
Figure 4 for Compute-Constrained Data Selection
Viaarxiv icon

Contextual Document Embeddings

Add code
Oct 03, 2024
Figure 1 for Contextual Document Embeddings
Figure 2 for Contextual Document Embeddings
Figure 3 for Contextual Document Embeddings
Figure 4 for Contextual Document Embeddings
Viaarxiv icon

A Controlled Study on Long Context Extension and Generalization in LLMs

Add code
Sep 18, 2024
Figure 1 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 2 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 3 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 4 for A Controlled Study on Long Context Extension and Generalization in LLMs
Viaarxiv icon

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Add code
Aug 27, 2024
Viaarxiv icon

I Could've Asked That: Reformulating Unanswerable Questions

Add code
Jul 24, 2024
Figure 1 for I Could've Asked That: Reformulating Unanswerable Questions
Figure 2 for I Could've Asked That: Reformulating Unanswerable Questions
Figure 3 for I Could've Asked That: Reformulating Unanswerable Questions
Figure 4 for I Could've Asked That: Reformulating Unanswerable Questions
Viaarxiv icon

Entity Disambiguation via Fusion Entity Decoding

Add code
Apr 02, 2024
Viaarxiv icon

Diffusion Models Without Attention

Add code
Nov 30, 2023
Figure 1 for Diffusion Models Without Attention
Figure 2 for Diffusion Models Without Attention
Figure 3 for Diffusion Models Without Attention
Figure 4 for Diffusion Models Without Attention
Viaarxiv icon