Picture for Hyunsoo Ha

Hyunsoo Ha

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models

Add code
Sep 30, 2024
Viaarxiv icon

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

Add code
Sep 15, 2024
Viaarxiv icon