Picture for Chiwun Yang

Chiwun Yang

Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency

Add code
Mar 18, 2025
Viaarxiv icon

ParallelComp: Parallel Long-Context Compressor for Length Extrapolation

Add code
Feb 20, 2025
Viaarxiv icon

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Add code
Dec 08, 2024
Viaarxiv icon

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Figure 1 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 2 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 3 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Viaarxiv icon

Toward Infinite-Long Prefix in Transformer

Add code
Jun 20, 2024
Figure 1 for Toward Infinite-Long Prefix in Transformer
Figure 2 for Toward Infinite-Long Prefix in Transformer
Figure 3 for Toward Infinite-Long Prefix in Transformer
Viaarxiv icon

Attention is Naturally Sparse with Gaussian Distributed Input

Add code
Apr 03, 2024
Figure 1 for Attention is Naturally Sparse with Gaussian Distributed Input
Figure 2 for Attention is Naturally Sparse with Gaussian Distributed Input
Viaarxiv icon

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

Add code
Feb 02, 2024
Viaarxiv icon

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

Add code
Nov 24, 2023
Viaarxiv icon

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

Add code
Nov 22, 2023
Viaarxiv icon

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

Add code
Oct 19, 2023
Figure 1 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 2 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 3 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 4 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Viaarxiv icon