Picture for Chaitanya Malaviya

Chaitanya Malaviya

Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

Add code
Nov 11, 2024
Viaarxiv icon

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Add code
Jul 22, 2024
Viaarxiv icon

DOLOMITES: Domain-Specific Long-Form Methodical Tasks

Add code
May 09, 2024
Viaarxiv icon

Calibrating Large Language Models with Sample Consistency

Add code
Feb 21, 2024
Viaarxiv icon

Pachinko: Patching Interpretable QA Models through Natural Language Feedback

Add code
Nov 16, 2023
Viaarxiv icon

ExpertQA: Expert-Curated Questions and Attributed Answers

Add code
Sep 14, 2023
Viaarxiv icon

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

Add code
May 19, 2023
Viaarxiv icon

AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference

Add code
Feb 03, 2023
Viaarxiv icon

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

Add code
Oct 24, 2022
Viaarxiv icon

G-DAUG: Generative Data Augmentation for Commonsense Reasoning

Add code
Apr 24, 2020
Figure 1 for G-DAUG: Generative Data Augmentation for Commonsense Reasoning
Figure 2 for G-DAUG: Generative Data Augmentation for Commonsense Reasoning
Figure 3 for G-DAUG: Generative Data Augmentation for Commonsense Reasoning
Figure 4 for G-DAUG: Generative Data Augmentation for Commonsense Reasoning
Viaarxiv icon