Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adam Perer

Over-Relying on Reliance: Towards Realistic Evaluations of AI-Based Clinical Decision Support

Apr 10, 2025

Venkatesh Sivaraman, Katelyn Morrison, Will Epperson, Adam Perer

Abstract:As AI-based clinical decision support (AI-CDS) is introduced in more and more aspects of healthcare services, HCI research plays an increasingly important role in designing for complementarity between AI and clinicians. However, current evaluations of AI-CDS often fail to capture when AI is and is not useful to clinicians. This position paper reflects on our work and influential AI-CDS literature to advocate for moving beyond evaluation metrics like Trust, Reliance, Acceptance, and Performance on the AI's task (what we term the "trap" of human-AI collaboration). Although these metrics can be meaningful in some simple scenarios, we argue that optimizing for them ignores important ways that AI falls short of clinical benefit, as well as ways that clinicians successfully use AI. As the fields of HCI and AI in healthcare develop new ways to design and evaluate CDS tools, we call on the community to prioritize ecologically valid, domain-appropriate study setups that measure the emergent forms of value that AI can bring to healthcare professionals.

* Accepted to the CHI '25 Workshop on Envisioning the Future of Interactive Health

Via

Access Paper or Ask Questions

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Apr 10, 2024

Unnseo Park, Venkatesh Sivaraman, Adam Perer

Abstract:Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.

* 6 pages, 3 figures; accepted workshop paper at Time Series for Health @ ICLR 2024

Via

Access Paper or Ask Questions

The Impact of Imperfect XAI on Human-AI Decision-Making

Jul 25, 2023

Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer

Abstract:Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility for the explanations to be incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.

* 27 pages, 9 figures, 1 table, additional figures/table in the abstract

Via

Access Paper or Ask Questions

Improving Human-AI Collaboration With Descriptions of AI Behavior

Jan 06, 2023

Ángel Alexander Cabrera, Adam Perer, Jason I. Hong

Figure 1 for Improving Human-AI Collaboration With Descriptions of AI Behavior

Figure 2 for Improving Human-AI Collaboration With Descriptions of AI Behavior

Figure 3 for Improving Human-AI Collaboration With Descriptions of AI Behavior

Figure 4 for Improving Human-AI Collaboration With Descriptions of AI Behavior

Abstract:People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 participants in three distinct domains: fake review detection, satellite image classification, and bird classification. We found that behavior descriptions can increase human-AI accuracy through two mechanisms: helping people identify AI failures and increasing people's reliance on the AI when it is more accurate. These findings highlight the importance of people's mental models in human-AI collaboration and show that informing people of high-level AI behaviors can significantly improve AI-assisted decision making.

* Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 136 (April 2023)
* 21 pages

Via

Access Paper or Ask Questions

"Public(s)-in-the-Loop": Facilitating Deliberation of Algorithmic Decisions in Contentious Public Policy Domains

Apr 22, 2022

Hong Shen, Ángel Alexander Cabrera, Adam Perer, Jason Hong

Abstract:This position paper offers a framework to think about how to better involve human influence in algorithmic decision-making of contentious public policy issues. Drawing from insights in communication literature, we introduce a "public(s)-in-the-loop" approach and enumerates three features that are central to this approach: publics as plural political entities, collective decision-making through deliberation, and the construction of publics. It explores how these features might advance our understanding of stakeholder participation in AI design in contentious public policy domains such as recidivism prediction. Finally, it sketches out part of a research agenda for the HCI community to support this work.

* 5 pages, 0 figure, accepted to CHI2020 Fair & Responsible AI Workshop

Via

Access Paper or Ask Questions

Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support

Apr 05, 2022

Anna Kawakami, Venkatesh Sivaraman, Hao-Fei Cheng, Logan Stapleton, Yanghuidi Cheng, Diana Qing, Adam Perer, Zhiwei Steven Wu, Haiyi Zhu, Kenneth Holstein

Figure 1 for Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support

Figure 2 for Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support

Abstract:AI-based decision support tools (ADS) are increasingly used to augment human decision-making in high-stakes, social contexts. As public sector agencies begin to adopt ADS, it is critical that we understand workers' experiences with these systems in practice. In this paper, we present findings from a series of interviews and contextual inquiries at a child welfare agency, to understand how they currently make AI-assisted child maltreatment screening decisions. Overall, we observe how workers' reliance upon the ADS is guided by (1) their knowledge of rich, contextual information beyond what the AI model captures, (2) their beliefs about the ADS's capabilities and limitations relative to their own, (3) organizational pressures and incentives around the use of the ADS, and (4) awareness of misalignments between algorithmic predictions and their own decision-making objectives. Drawing upon these findings, we discuss design implications towards supporting more effective human-AI decision-making.

* 2022 Conference on Human Factors in Computing Systems

Via

Access Paper or Ask Questions

Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Feb 16, 2022

Venkatesh Sivaraman, Yiwei Wu, Adam Perer

Figure 1 for Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Figure 2 for Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Figure 3 for Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Figure 4 for Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Abstract:Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.

* 23 pages, 5 figures, 2 tables. To be presented at IUI'22. arXiv version updated Feb 16 2022 with corrected publication year and copyright

Via

Access Paper or Ask Questions

Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment

Nov 04, 2021

Donghoon Shin, Sachin Grover, Kenneth Holstein, Adam Perer

Figure 1 for Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment

Figure 2 for Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment

Figure 3 for Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment

Abstract:Explainable AI (XAI) is a promising means of supporting human-AI collaborations for high-stakes visual detection tasks, such as damage detection tasks from satellite imageries, as fully-automated approaches are unlikely to be perfectly safe and reliable. However, most existing XAI techniques are not informed by the understandings of task-specific needs of humans for explanations. Thus, we took a first step toward understanding what forms of XAI humans require in damage detection tasks. We conducted an online crowdsourced study to understand how people explain their own assessments, when evaluating the severity of building damage based on satellite imagery. Through the study with 60 crowdworkers, we surfaced six major strategies that humans utilize to explain their visual damage assessments. We present implications of our findings for the design of XAI methods for such visual detection contexts, and discuss opportunities for future research.

* Accepted at NeurIPS 2021 Workshop on Artificial Intelligence for Humanitarian Assistance and Disaster Response (AI+HADR 2021)

Via

Access Paper or Ask Questions

Discovering and Validating AI Errors With Crowdsourced Failure Reports

Sep 23, 2021

Ángel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, Adam Perer

Figure 1 for Discovering and Validating AI Errors With Crowdsourced Failure Reports

Figure 2 for Discovering and Validating AI Errors With Crowdsourced Failure Reports

Figure 3 for Discovering and Validating AI Errors With Crowdsourced Failure Reports

Figure 4 for Discovering and Validating AI Errors With Crowdsourced Failure Reports

Abstract:AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or why a model failed, and show how developers can use them to detect AI errors. We also design and implement Deblinder, a visual analytics system for synthesizing failure reports that developers can use to discover and validate systematic failures. In semi-structured interviews and think-aloud studies with 10 AI practitioners, we explore the affordances of the Deblinder system and the applicability of failure reports in real-world settings. Lastly, we show how collecting additional data from the groups identified by developers can improve model performance.

Via

Access Paper or Ask Questions

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Mar 19, 2021

Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser

Figure 1 for TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Figure 2 for TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Figure 3 for TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Figure 4 for TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Abstract:Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence.

* Accepted as a Systems Demonstration at NAACL-HLT 2021. Video demonstration at https://youtu.be/1xEEfsMwL0k

Via

Access Paper or Ask Questions