Picture for Jiahao Ying

Jiahao Ying

Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law

Add code
Apr 10, 2025
Viaarxiv icon

SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia

Add code
Feb 10, 2025
Viaarxiv icon

EvoWiki: Evaluating LLMs on Evolving Knowledge

Add code
Dec 18, 2024
Viaarxiv icon

Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

Add code
Aug 21, 2024
Figure 1 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 2 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 3 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 4 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Viaarxiv icon

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Add code
Jun 29, 2024
Viaarxiv icon

QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

Add code
Jun 19, 2024
Viaarxiv icon

A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

Add code
Jun 06, 2024
Figure 1 for A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential
Figure 2 for A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential
Figure 3 for A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential
Figure 4 for A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential
Viaarxiv icon

Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation

Add code
Feb 28, 2024
Figure 1 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 2 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 3 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 4 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Viaarxiv icon

Intuitive or Dependent? Investigating LLMs' Robustness to Conflicting Prompts

Add code
Oct 03, 2023
Figure 1 for Intuitive or Dependent? Investigating LLMs' Robustness to Conflicting Prompts
Figure 2 for Intuitive or Dependent? Investigating LLMs' Robustness to Conflicting Prompts
Figure 3 for Intuitive or Dependent? Investigating LLMs' Robustness to Conflicting Prompts
Figure 4 for Intuitive or Dependent? Investigating LLMs' Robustness to Conflicting Prompts
Viaarxiv icon

Benchmarking Foundation Models with Language-Model-as-an-Examiner

Add code
Jun 07, 2023
Viaarxiv icon