Picture for Pierre-Carl Langlais

Pierre-Carl Langlais

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

Toxicity of the Commons: Curating Open-Source Pre-Training Data

Add code
Oct 29, 2024
Viaarxiv icon