Picture for Benjamin Roth

Benjamin Roth

Select or Project? Evaluating Lower-dimensional Vectors for LLM Training Data Explanations

Add code
Jan 23, 2026
Viaarxiv icon

Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs

Add code
Jan 15, 2026
Viaarxiv icon

Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations

Add code
Jan 12, 2026
Viaarxiv icon

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Add code
Jan 12, 2026
Viaarxiv icon

Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Add code
Jan 07, 2026
Viaarxiv icon

Compact Example-Based Explanations for Language Models

Add code
Jan 07, 2026
Viaarxiv icon

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions

Add code
Dec 14, 2025
Figure 1 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 2 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 3 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 4 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Viaarxiv icon

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

Add code
Aug 27, 2025
Viaarxiv icon

Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles

Add code
Jan 07, 2025
Viaarxiv icon

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks

Add code
Sep 06, 2024
Viaarxiv icon