Picture for Dan Busbridge

Dan Busbridge

Distillation Scaling Laws

Add code
Feb 12, 2025
Viaarxiv icon

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection

Add code
Feb 09, 2025
Viaarxiv icon

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Add code
Jan 21, 2025
Viaarxiv icon

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Add code
Sep 06, 2024
Figure 1 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 2 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 3 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 4 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Viaarxiv icon

Poly-View Contrastive Learning

Add code
Mar 08, 2024
Viaarxiv icon

Bootstrap Your Own Variance

Add code
Dec 06, 2023
Viaarxiv icon

REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation

Add code
Sep 07, 2023
Viaarxiv icon

How to Scale Your EMA

Add code
Jul 27, 2023
Viaarxiv icon

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

Add code
Jul 20, 2023
Viaarxiv icon

DUET: 2D Structured and Approximately Equivariant Representations

Add code
Jun 30, 2023
Viaarxiv icon