Picture for Yoo Yeon Sung

Yoo Yeon Sung

VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures

Add code
Mar 16, 2025
Viaarxiv icon

GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration

Add code
Feb 27, 2025
Viaarxiv icon

ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks

Add code
Jun 24, 2024
Figure 1 for ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks
Figure 2 for ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks
Figure 3 for ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks
Figure 4 for ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks
Viaarxiv icon

How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation

Add code
Jan 20, 2024
Viaarxiv icon

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines

Add code
Oct 20, 2023
Figure 1 for Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Figure 2 for Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Figure 3 for Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Figure 4 for Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Viaarxiv icon