Picture for Gal Vardi

Gal Vardi

To Grok Grokking: Provable Grokking in Ridge Regression

Add code
Jan 27, 2026
Viaarxiv icon

On the Hardness of Learning Regular Expressions

Add code
Oct 06, 2025
Viaarxiv icon

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Add code
May 25, 2025
Figure 1 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 2 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 3 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 4 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Viaarxiv icon

A Theory of Learning with Autoregressive Chain of Thought

Add code
Mar 11, 2025
Viaarxiv icon

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Add code
Oct 29, 2024
Viaarxiv icon

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Add code
Oct 24, 2024
Figure 1 for Provable Tempered Overfitting of Minimal Nets and Typical Nets
Figure 2 for Provable Tempered Overfitting of Minimal Nets and Typical Nets
Figure 3 for Provable Tempered Overfitting of Minimal Nets and Typical Nets
Viaarxiv icon

Provable Privacy Attacks on Trained Shallow Neural Networks

Add code
Oct 10, 2024
Viaarxiv icon

Benign Overfitting in Single-Head Attention

Add code
Oct 10, 2024
Figure 1 for Benign Overfitting in Single-Head Attention
Figure 2 for Benign Overfitting in Single-Head Attention
Figure 3 for Benign Overfitting in Single-Head Attention
Figure 4 for Benign Overfitting in Single-Head Attention
Viaarxiv icon

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Add code
Oct 02, 2024
Viaarxiv icon

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

Add code
Sep 05, 2024
Viaarxiv icon