Picture for Chanjun Park

Chanjun Park

Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance

Add code
Dec 10, 2024
Viaarxiv icon

LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models

Add code
Nov 18, 2024
Viaarxiv icon

Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching

Add code
Oct 24, 2024
Viaarxiv icon

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs

Add code
Oct 16, 2024
Figure 1 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 2 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 3 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Figure 4 for Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Viaarxiv icon

Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models

Add code
Oct 07, 2024
Figure 1 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 2 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 3 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Figure 4 for Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Viaarxiv icon

InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets

Add code
Oct 02, 2024
Viaarxiv icon

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models

Add code
Sep 30, 2024
Figure 1 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Figure 2 for 1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Viaarxiv icon

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

Add code
Sep 15, 2024
Figure 1 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 2 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 3 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Figure 4 for Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Viaarxiv icon

Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard

Add code
Sep 05, 2024
Viaarxiv icon

ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

Add code
Jun 05, 2024
Viaarxiv icon