Picture for Hongkang Li

Hongkang Li

Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs

Add code
Apr 11, 2026
Viaarxiv icon

Visual prompting reimagined: The power of the Activation Prompts

Add code
Apr 07, 2026
Viaarxiv icon

A Theoretical Analysis of Mamba's Training Dynamics: Filtering Relevant Features for Generalization in State Space Models

Add code
Feb 13, 2026
Viaarxiv icon

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

Add code
Jun 10, 2025
Viaarxiv icon

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Add code
Apr 15, 2025
Viaarxiv icon

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Add code
Oct 03, 2024
Figure 1 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Figure 2 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Figure 3 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Figure 4 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Viaarxiv icon

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Add code
Jun 24, 2024
Figure 1 for Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Figure 2 for Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Figure 3 for Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Figure 4 for Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Viaarxiv icon

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding

Add code
Jun 04, 2024
Figure 1 for What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Figure 2 for What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Figure 3 for What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Figure 4 for What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Viaarxiv icon

How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance

Add code
Mar 19, 2024
Figure 1 for How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
Figure 2 for How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
Figure 3 for How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
Figure 4 for How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
Viaarxiv icon

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

Add code
Feb 23, 2024
Figure 1 for Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis
Figure 2 for Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis
Figure 3 for Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis
Figure 4 for Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis
Viaarxiv icon