Picture for Heuiseok Lim

Heuiseok Lim

Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models

Add code
Oct 09, 2025
Figure 1 for Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models
Figure 2 for Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models
Figure 3 for Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models
Figure 4 for Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models
Viaarxiv icon

TORSO: Template-Oriented Reasoning Towards General Tasks

Add code
Sep 11, 2025
Viaarxiv icon

NeedleChain: Measuring Intact Long-Context Reasoning Capability of Large Language Models

Add code
Jul 30, 2025
Viaarxiv icon

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems

Add code
Jul 10, 2025
Viaarxiv icon

Cross-Lingual Optimization for Language Transfer in Large Language Models

Add code
May 20, 2025
Viaarxiv icon

Semantic Aware Linear Transfer by Recycling Pre-trained Language Models for Cross-lingual Transfer

Add code
May 16, 2025
Viaarxiv icon

MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Add code
Apr 23, 2025
Viaarxiv icon

Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning

Add code
Apr 07, 2025
Viaarxiv icon

FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

Add code
Mar 25, 2025
Figure 1 for FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Figure 2 for FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Figure 3 for FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Figure 4 for FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Viaarxiv icon

Call for Rigor in Reporting Quality of Instruction Tuning Data

Add code
Mar 04, 2025
Viaarxiv icon