Picture for Xingwu Chen

Xingwu Chen

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Add code
Aug 08, 2024
Viaarxiv icon

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

Add code
Apr 02, 2024
Viaarxiv icon