Picture for Guijin Son

Guijin Son

Multi-Step Reasoning in Korean and the Emergent Mirage

Add code
Jan 10, 2025
Viaarxiv icon

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap

Add code
Jan 05, 2025
Viaarxiv icon

Improving Fine-grained Visual Understanding in VLMs through Text-Only Training

Add code
Dec 17, 2024
Viaarxiv icon

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Add code
Oct 23, 2024
Figure 1 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 2 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 3 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 4 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Viaarxiv icon

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

Add code
Sep 17, 2024
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

ESG Classification by Implicit Rule Learning via GPT-4

Add code
Mar 22, 2024
Viaarxiv icon

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

Add code
Feb 18, 2024
Viaarxiv icon

KMMLU: Measuring Massive Multitask Language Understanding in Korean

Add code
Feb 18, 2024
Viaarxiv icon

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Add code
Sep 15, 2023
Viaarxiv icon