Picture for Sean Hughes

Sean Hughes

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

The BigCode Project Governance Card

Add code
Dec 06, 2023
Viaarxiv icon

StarCoder: may the source be with you!

Add code
May 09, 2023
Viaarxiv icon

SantaCoder: don't reach for the stars!

Add code
Jan 09, 2023
Viaarxiv icon

The Stack: 3 TB of permissively licensed source code

Add code
Nov 20, 2022
Viaarxiv icon