Picture for Dan Roth

Dan Roth

Shammie

Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

Add code
Nov 11, 2024
Viaarxiv icon

Benchmarking LLM Guardrails in Handling Multilingual Toxicity

Add code
Oct 29, 2024
Viaarxiv icon

ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Add code
Oct 24, 2024
Viaarxiv icon

Open Domain Question Answering with Conflicting Contexts

Add code
Oct 16, 2024
Viaarxiv icon

GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation

Add code
Oct 11, 2024
Viaarxiv icon

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Add code
Oct 03, 2024
Viaarxiv icon

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Add code
Sep 16, 2024
Figure 1 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 2 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 3 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 4 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Viaarxiv icon

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

Add code
Aug 30, 2024
Viaarxiv icon

Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

Add code
Aug 25, 2024
Viaarxiv icon

Enhancing Temporal Understanding in LLMs for Semi-structured Tables

Add code
Jul 22, 2024
Viaarxiv icon