Picture for Benjamin Roth

Benjamin Roth

Not All Explanations Simulate Equally: Comparing Verbalized Feature Attributions and Self-Generated Rationales

Add code
May 31, 2026
Viaarxiv icon

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

Add code
May 27, 2026
Viaarxiv icon

Select or Project? Evaluating Lower-dimensional Vectors for LLM Training Data Explanations

Add code
Jan 23, 2026
Viaarxiv icon

Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs

Add code
Jan 15, 2026
Viaarxiv icon

Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations

Add code
Jan 12, 2026
Viaarxiv icon

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Add code
Jan 12, 2026
Viaarxiv icon

Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Add code
Jan 07, 2026
Viaarxiv icon

Compact Example-Based Explanations for Language Models

Add code
Jan 07, 2026
Viaarxiv icon

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions

Add code
Dec 14, 2025
Figure 1 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 2 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 3 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Figure 4 for Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Viaarxiv icon

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

Add code
Aug 27, 2025
Viaarxiv icon