Picture for Chiwun Yang

Chiwun Yang

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Viaarxiv icon

Toward Infinite-Long Prefix in Transformer

Add code
Jun 20, 2024
Figure 1 for Toward Infinite-Long Prefix in Transformer
Figure 2 for Toward Infinite-Long Prefix in Transformer
Figure 3 for Toward Infinite-Long Prefix in Transformer
Viaarxiv icon

Attention is Naturally Sparse with Gaussian Distributed Input

Add code
Apr 03, 2024
Figure 1 for Attention is Naturally Sparse with Gaussian Distributed Input
Figure 2 for Attention is Naturally Sparse with Gaussian Distributed Input
Viaarxiv icon

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

Add code
Feb 02, 2024
Viaarxiv icon

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

Add code
Nov 24, 2023
Viaarxiv icon

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

Add code
Nov 22, 2023
Viaarxiv icon

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

Add code
Oct 19, 2023
Viaarxiv icon

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Add code
Oct 17, 2023
Viaarxiv icon

Fine-tune Language Models to Approximate Unbiased In-context Learning

Add code
Oct 05, 2023
Viaarxiv icon

How to Protect Copyright Data in Optimization of Large Language Models?

Add code
Aug 23, 2023
Viaarxiv icon