Picture for Zhao Song

Zhao Song

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Add code
Dec 09, 2024
Viaarxiv icon

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Add code
Dec 08, 2024
Viaarxiv icon

On Socially Fair Low-Rank Approximation and Column Subset Selection

Add code
Dec 08, 2024
Viaarxiv icon

On the Expressive Power of Modern Hopfield Networks

Add code
Dec 07, 2024
Viaarxiv icon

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Add code
Nov 25, 2024
Viaarxiv icon

Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

Add code
Nov 25, 2024
Viaarxiv icon

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Add code
Nov 12, 2024
Viaarxiv icon

On Differentially Private String Distances

Add code
Nov 08, 2024
Viaarxiv icon

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Figure 1 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 2 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 3 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Viaarxiv icon

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Add code
Oct 15, 2024
Viaarxiv icon