Picture for Yichuan Deng

Yichuan Deng

Attention is Naturally Sparse with Gaussian Distributed Input

Add code
Apr 03, 2024
Figure 1 for Attention is Naturally Sparse with Gaussian Distributed Input
Figure 2 for Attention is Naturally Sparse with Gaussian Distributed Input
Viaarxiv icon

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

Add code
Feb 02, 2024
Viaarxiv icon

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

Add code
Oct 19, 2023
Figure 1 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 2 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 3 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 4 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Viaarxiv icon

Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

Add code
Oct 18, 2023
Figure 1 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 2 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 3 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 4 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Viaarxiv icon

Clustered Linear Contextual Bandits with Knapsacks

Add code
Aug 21, 2023
Viaarxiv icon

Convergence of Two-Layer Regression with Nonlinear Units

Add code
Aug 16, 2023
Viaarxiv icon

Zero-th Order Algorithm for Softmax Attention Optimization

Add code
Jul 17, 2023
Viaarxiv icon

Faster Robust Tensor Power Method for Arbitrary Order

Add code
Jun 01, 2023
Viaarxiv icon

Attention Scheme Inspired Softmax Regression

Add code
Apr 26, 2023
Viaarxiv icon

Solving Tensor Low Cycle Rank Approximation

Add code
Apr 13, 2023
Viaarxiv icon