Picture for Yun Luo

Yun Luo

HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Add code
Sep 10, 2025
Viaarxiv icon

Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

Add code
Sep 04, 2025
Viaarxiv icon

RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction

Add code
Feb 25, 2025
Figure 1 for RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Figure 2 for RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Figure 3 for RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Figure 4 for RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Viaarxiv icon

PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization

Add code
Dec 17, 2024
Figure 1 for PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
Figure 2 for PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
Figure 3 for PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
Figure 4 for PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
Viaarxiv icon

Task Calibration: Calibrating Large Language Models on Inference Tasks

Add code
Oct 24, 2024
Viaarxiv icon

Keys to Robust Edits: from Theoretical Insights to Practical Advances

Add code
Oct 12, 2024
Viaarxiv icon

OpenResearcher: Unleashing AI for Accelerated Scientific Research

Add code
Aug 13, 2024
Figure 1 for OpenResearcher: Unleashing AI for Accelerated Scientific Research
Figure 2 for OpenResearcher: Unleashing AI for Accelerated Scientific Research
Figure 3 for OpenResearcher: Unleashing AI for Accelerated Scientific Research
Figure 4 for OpenResearcher: Unleashing AI for Accelerated Scientific Research
Viaarxiv icon

RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models

Add code
May 23, 2024
Figure 1 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 2 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 3 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 4 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Viaarxiv icon

Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

Add code
Apr 18, 2024
Viaarxiv icon

RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models

Add code
Feb 22, 2024
Viaarxiv icon