Picture for Jonathan Frankle

Jonathan Frankle

Does your data spark joy? Performance gains from domain upsampling at the end of training

Add code
Jun 05, 2024
Viaarxiv icon

LoRA Learns Less and Forgets Less

Add code
May 15, 2024
Viaarxiv icon

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Add code
Mar 27, 2024
Viaarxiv icon

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Add code
Jan 16, 2024
Viaarxiv icon

Dataset Difficulty and the Role of Inductive Bias

Add code
Jan 03, 2024
Viaarxiv icon

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Add code
Dec 31, 2023
Viaarxiv icon

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Add code
Oct 25, 2023
Viaarxiv icon

Dynamic Masking Rate Schedules for MLM Pretraining

Add code
May 24, 2023
Viaarxiv icon

Knowledge Distillation for Efficient Sequences of Training Runs

Add code
Mar 11, 2023
Viaarxiv icon

The Effect of Data Dimensionality on Neural Network Prunability

Add code
Dec 01, 2022
Viaarxiv icon