Picture for Joseph Jennings

Joseph Jennings

Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents

Add code
Feb 06, 2025
Figure 1 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 2 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 3 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Figure 4 for Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Viaarxiv icon

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Add code
Dec 03, 2024
Viaarxiv icon

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Add code
Jul 08, 2024
Viaarxiv icon

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

Nemotron-4 15B Technical Report

Add code
Feb 27, 2024
Figure 1 for Nemotron-4 15B Technical Report
Figure 2 for Nemotron-4 15B Technical Report
Figure 3 for Nemotron-4 15B Technical Report
Figure 4 for Nemotron-4 15B Technical Report
Viaarxiv icon