Picture for Seonghoon Yang

Seonghoon Yang

LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models

Add code
Nov 18, 2024
Viaarxiv icon

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models

Add code
Sep 30, 2024
Figure 1 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Figure 2 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Viaarxiv icon

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

Add code
Sep 15, 2024
Figure 1 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 2 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 3 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 4 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Viaarxiv icon

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Add code
Dec 29, 2023
Viaarxiv icon