Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Max Cembalest

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Mar 23, 2023

Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson

Figure 1 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Figure 2 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Figure 3 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Figure 4 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Abstract:As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.

Via

Access Paper or Ask Questions

Tensions Between the Proxies of Human Values in AI

Dec 14, 2022

Teresa Datta, Daniel Nissani, Max Cembalest, Akash Khanna, Haley Massa, John P. Dickerson

Abstract:Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized methods, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this position paper, we push for redirection. We argue that the AI community needs to consider all the consequences of choosing certain formulations of these pillars -- not just the technical incompatibilities, but also the effects within the context of deployment. We point towards sociotechnical research for frameworks for the latter, but push for broader efforts into implementing these in practice.

* Contributed Talk, NeurIPS 2022 Workshop on Algorithmic Fairness through the Lens of Causality and Privacy; To be published in 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Via

Access Paper or Ask Questions