Picture for Seungone Kim

Seungone Kim

Bridging the Data Provenance Gap Across Text, Speech and Video

Add code
Dec 19, 2024
Figure 1 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 2 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 3 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 4 for Bridging the Data Provenance Gap Across Text, Speech and Video
Viaarxiv icon

LLM-AS-AN-INTERVIEWER: Beyond Static Testing Through Dynamic LLM Evaluation

Add code
Dec 10, 2024
Viaarxiv icon

Evaluating Language Models as Synthetic Data Generators

Add code
Dec 04, 2024
Figure 1 for Evaluating Language Models as Synthetic Data Generators
Figure 2 for Evaluating Language Models as Synthetic Data Generators
Figure 3 for Evaluating Language Models as Synthetic Data Generators
Figure 4 for Evaluating Language Models as Synthetic Data Generators
Viaarxiv icon

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Add code
Oct 23, 2024
Figure 1 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 2 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 3 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 4 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Viaarxiv icon

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Add code
Oct 21, 2024
Viaarxiv icon

Better Instruction-Following Through Minimum Bayes Risk

Add code
Oct 07, 2024
Viaarxiv icon

Consent in Crisis: The Rapid Decline of the AI Data Commons

Add code
Jul 24, 2024
Figure 1 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 2 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 3 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 4 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Viaarxiv icon

Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education

Add code
Jul 24, 2024
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

Aligning to Thousands of Preferences via System Message Generalization

Add code
May 28, 2024
Viaarxiv icon