Picture for Sukyung Lee

Sukyung Lee

LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models

Add code
Nov 18, 2024
Viaarxiv icon

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs

Add code
Oct 16, 2024
Figure 1 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 2 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 3 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 4 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Viaarxiv icon

Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models

Add code
Oct 07, 2024
Figure 1 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 2 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 3 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 4 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Viaarxiv icon

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models

Add code
Sep 30, 2024
Figure 1 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Figure 2 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Viaarxiv icon

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

Add code
Sep 15, 2024
Figure 1 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 2 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 3 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 4 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Viaarxiv icon

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

Add code
May 31, 2024
Viaarxiv icon

Dataverse: Open-Source ETL Pipeline for Large Language Models

Add code
Mar 28, 2024
Viaarxiv icon

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Add code
Dec 29, 2023
Viaarxiv icon