Picture for Andrew Gordon Wilson

Andrew Gordon Wilson

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Add code
Jul 09, 2025
Viaarxiv icon

Out-of-Distribution Detection Methods Answer the Wrong Questions

Add code
Jul 02, 2025
Viaarxiv icon

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Add code
Jul 02, 2025
Viaarxiv icon

Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Add code
Jun 24, 2025
Viaarxiv icon

Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion

Add code
Jun 10, 2025
Viaarxiv icon

Compute-Optimal LLMs Provably Generalize Better With Scale

Add code
Apr 21, 2025
Viaarxiv icon

When Should We Orchestrate Multiple Agents?

Add code
Mar 17, 2025
Viaarxiv icon

Deep Learning is Not So Mysterious or Different

Add code
Mar 03, 2025
Viaarxiv icon

Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Add code
Feb 06, 2025
Viaarxiv icon

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

Add code
Dec 10, 2024
Viaarxiv icon