Picture for David Brandfonbrener

David Brandfonbrener

GQ-VAE: A gated quantized VAE for learning variable length tokens

Add code
Dec 26, 2025
Viaarxiv icon

Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs

Add code
Dec 15, 2025
Viaarxiv icon

The Role of Sparsity for Length Generalization in Transformers

Add code
Feb 24, 2025
Figure 1 for The Role of Sparsity for Length Generalization in Transformers
Figure 2 for The Role of Sparsity for Length Generalization in Transformers
Figure 3 for The Role of Sparsity for Length Generalization in Transformers
Figure 4 for The Role of Sparsity for Length Generalization in Transformers
Viaarxiv icon

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Add code
Nov 19, 2024
Figure 1 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 2 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 3 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 4 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Figure 1 for Universal Length Generalization with Turing Programs
Figure 2 for Universal Length Generalization with Turing Programs
Figure 3 for Universal Length Generalization with Turing Programs
Figure 4 for Universal Length Generalization with Turing Programs
Viaarxiv icon

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

Add code
Jun 15, 2024
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Figure 1 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 2 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 3 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 4 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Viaarxiv icon