Picture for Alexandra Chouldechova

Alexandra Chouldechova

Effects of Generative AI Errors on User Reliance Across Task Difficulty

Add code
Apr 05, 2026
Viaarxiv icon

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Add code
Jan 26, 2026
Viaarxiv icon

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Add code
Jun 04, 2025
Viaarxiv icon

Taxonomizing Representational Harms using Speech Act Theory

Add code
Apr 01, 2025
Figure 1 for Taxonomizing Representational Harms using Speech Act Theory
Figure 2 for Taxonomizing Representational Harms using Speech Act Theory
Figure 3 for Taxonomizing Representational Harms using Speech Act Theory
Figure 4 for Taxonomizing Representational Harms using Speech Act Theory
Viaarxiv icon

Validating LLM-as-a-Judge Systems in the Absence of Gold Labels

Add code
Mar 07, 2025
Figure 1 for Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Figure 2 for Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Figure 3 for Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Figure 4 for Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Viaarxiv icon

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

Add code
Dec 09, 2024
Figure 1 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 2 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 3 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 4 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Viaarxiv icon

A Framework for Evaluating LLMs Under Task Indeterminacy

Add code
Nov 21, 2024
Figure 1 for A Framework for Evaluating LLMs Under Task Indeterminacy
Viaarxiv icon

SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation

Add code
Nov 14, 2024
Figure 1 for SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
Figure 2 for SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
Figure 3 for SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
Figure 4 for SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
Viaarxiv icon

A structured regression approach for evaluating model performance across intersectional subgroups

Add code
Jan 26, 2024
Figure 1 for A structured regression approach for evaluating model performance across intersectional subgroups
Figure 2 for A structured regression approach for evaluating model performance across intersectional subgroups
Figure 3 for A structured regression approach for evaluating model performance across intersectional subgroups
Figure 4 for A structured regression approach for evaluating model performance across intersectional subgroups
Viaarxiv icon

The Impact of Differential Feature Under-reporting on Algorithmic Fairness

Add code
Jan 16, 2024
Figure 1 for The Impact of Differential Feature Under-reporting on Algorithmic Fairness
Figure 2 for The Impact of Differential Feature Under-reporting on Algorithmic Fairness
Figure 3 for The Impact of Differential Feature Under-reporting on Algorithmic Fairness
Figure 4 for The Impact of Differential Feature Under-reporting on Algorithmic Fairness
Viaarxiv icon