Picture for Taiji Suzuki

Taiji Suzuki

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

Add code
Feb 09, 2025
Viaarxiv icon

Direct Distributional Optimization for Provable Alignment of Diffusion Models

Add code
Feb 05, 2025
Viaarxiv icon

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Add code
Feb 02, 2025
Viaarxiv icon

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Add code
Jan 09, 2025
Figure 1 for Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Figure 2 for Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Viaarxiv icon

On the Comparison between Multi-modal and Single-modal Contrastive Learning

Add code
Nov 05, 2024
Viaarxiv icon

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

Add code
Nov 04, 2024
Viaarxiv icon

Pretrained transformer efficiently learns low-dimensional target functions in-context

Add code
Nov 04, 2024
Viaarxiv icon

Dimensionality-induced information loss of outliers in deep neural networks

Add code
Oct 29, 2024
Viaarxiv icon

Transformers Provably Solve Parity Efficiently with Chain of Thought

Add code
Oct 11, 2024
Figure 1 for Transformers Provably Solve Parity Efficiently with Chain of Thought
Figure 2 for Transformers Provably Solve Parity Efficiently with Chain of Thought
Figure 3 for Transformers Provably Solve Parity Efficiently with Chain of Thought
Figure 4 for Transformers Provably Solve Parity Efficiently with Chain of Thought
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon