Picture for Zhao Song

Zhao Song

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Add code
Nov 12, 2024
Viaarxiv icon

On Differentially Private String Distances

Add code
Nov 08, 2024
Viaarxiv icon

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Viaarxiv icon

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

Add code
Oct 15, 2024
Viaarxiv icon

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Add code
Oct 15, 2024
Viaarxiv icon

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

Add code
Oct 15, 2024
Viaarxiv icon

HSR-Enhanced Sparse Attention Acceleration

Add code
Oct 14, 2024
Figure 1 for HSR-Enhanced Sparse Attention Acceleration
Figure 2 for HSR-Enhanced Sparse Attention Acceleration
Viaarxiv icon

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Add code
Oct 12, 2024
Viaarxiv icon

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Add code
Oct 12, 2024
Viaarxiv icon

Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk

Add code
Oct 08, 2024
Figure 1 for Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk
Figure 2 for Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk
Viaarxiv icon