Picture for Andrey Gromov

Andrey Gromov

A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs

Add code
Jan 23, 2026
Viaarxiv icon

On the origin of neural scaling laws: from random graphs to natural language

Add code
Jan 15, 2026
Viaarxiv icon

PARQ: Piecewise-Affine Regularized Quantization

Add code
Mar 19, 2025
Figure 1 for PARQ: Piecewise-Affine Regularized Quantization
Figure 2 for PARQ: Piecewise-Affine Regularized Quantization
Figure 3 for PARQ: Piecewise-Affine Regularized Quantization
Figure 4 for PARQ: Piecewise-Affine Regularized Quantization
Viaarxiv icon

Spectral Journey: How Transformers Predict the Shortest Path

Add code
Feb 12, 2025
Viaarxiv icon

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Add code
Jun 13, 2024
Viaarxiv icon

Grokking Modular Polynomials

Add code
Jun 05, 2024
Figure 1 for Grokking Modular Polynomials
Figure 2 for Grokking Modular Polynomials
Figure 3 for Grokking Modular Polynomials
Viaarxiv icon

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Add code
Jun 04, 2024
Figure 1 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 2 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 3 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Figure 4 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Viaarxiv icon

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Add code
Apr 01, 2024
Viaarxiv icon

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Mar 26, 2024
Figure 1 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 2 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 3 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 4 for The Unreasonable Ineffectiveness of the Deeper Layers
Viaarxiv icon

Bridging Associative Memory and Probabilistic Modeling

Add code
Feb 15, 2024
Viaarxiv icon