Picture for Taiji Suzuki

Taiji Suzuki

Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality

Add code
Feb 02, 2026
Viaarxiv icon

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO

Add code
Feb 02, 2026
Viaarxiv icon

Zero-Flow Encoders

Add code
Jan 31, 2026
Viaarxiv icon

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Add code
Dec 21, 2025
Figure 1 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 2 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 3 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 4 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Viaarxiv icon

Sliding Window Recurrences for Sequence Models

Add code
Dec 15, 2025
Figure 1 for Sliding Window Recurrences for Sequence Models
Figure 2 for Sliding Window Recurrences for Sequence Models
Figure 3 for Sliding Window Recurrences for Sequence Models
Figure 4 for Sliding Window Recurrences for Sequence Models
Viaarxiv icon

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Add code
Nov 18, 2025
Viaarxiv icon

Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning

Add code
Nov 10, 2025
Viaarxiv icon

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Add code
Nov 10, 2025
Viaarxiv icon

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Add code
Jun 12, 2025
Viaarxiv icon