Picture for Luca Soldaini

Luca Soldaini

Amazon Alexa Search

Bolmo: Byteifying the Next Generation of Language Models

Add code
Dec 17, 2025
Viaarxiv icon

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

olmOCR 2: Unit Test Rewards for Document OCR

Add code
Oct 22, 2025
Viaarxiv icon

Overview of the TREC 2024 NeuCLIR Track

Add code
Sep 17, 2025
Viaarxiv icon

FlexOlmo: Open Language Models for Flexible Data Use

Add code
Jul 09, 2025
Figure 1 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 2 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 3 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 4 for FlexOlmo: Open Language Models for Flexible Data Use
Viaarxiv icon

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon

Teaching Models to Understand (but not Generate) High-risk Data

Add code
May 05, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Figure 1 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 2 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 3 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Figure 4 for DataDecide: How to Predict Best Pretraining Data with Small Experiments
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon

Automatic Detection of Research Values from Scientific Abstracts Across Computer Science Subfields

Add code
Feb 26, 2025
Viaarxiv icon