Picture for Zhenmei Shi

Zhenmei Shi

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Add code
Dec 09, 2024
Viaarxiv icon

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Add code
Dec 08, 2024
Viaarxiv icon

On the Expressive Power of Modern Hopfield Networks

Add code
Dec 07, 2024
Viaarxiv icon

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Add code
Nov 12, 2024
Viaarxiv icon

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

Add code
Oct 15, 2024
Viaarxiv icon

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

Add code
Oct 15, 2024
Figure 1 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 2 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 3 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Viaarxiv icon

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Add code
Oct 15, 2024
Viaarxiv icon

HSR-Enhanced Sparse Attention Acceleration

Add code
Oct 14, 2024
Figure 1 for HSR-Enhanced Sparse Attention Acceleration
Figure 2 for HSR-Enhanced Sparse Attention Acceleration
Viaarxiv icon

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Add code
Oct 12, 2024
Viaarxiv icon

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Add code
Oct 12, 2024
Viaarxiv icon