Picture for Taiji Suzuki

Taiji Suzuki

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Add code
Dec 21, 2025
Figure 1 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 2 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 3 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 4 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Viaarxiv icon

Sliding Window Recurrences for Sequence Models

Add code
Dec 15, 2025
Figure 1 for Sliding Window Recurrences for Sequence Models
Figure 2 for Sliding Window Recurrences for Sequence Models
Figure 3 for Sliding Window Recurrences for Sequence Models
Figure 4 for Sliding Window Recurrences for Sequence Models
Viaarxiv icon

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Add code
Nov 18, 2025
Viaarxiv icon

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Add code
Nov 10, 2025
Viaarxiv icon

Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning

Add code
Nov 10, 2025
Viaarxiv icon

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Add code
Jun 12, 2025
Viaarxiv icon

On the Role of Label Noise in the Feature Learning Process

Add code
May 25, 2025
Viaarxiv icon

Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models

Add code
May 12, 2025
Viaarxiv icon

Quantifying Memory Utilization with Effective State-Size

Add code
Apr 28, 2025
Viaarxiv icon

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Add code
Apr 24, 2025
Figure 1 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 2 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 3 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 4 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Viaarxiv icon