Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tanmoy Bhattacharya

Los Alamos National Laboratory

Global explainability of a deep abstaining classifier

Apr 01, 2025

Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Benjamin H. McMahon, Trilce Estrada, Kumkum Ganguly, Adam Spannaus, John P. Gounley, Xiao-Cheng Wu, Eric B. Durbin, Heidi A. Hanson(+1 more)

Figure 1 for Global explainability of a deep abstaining classifier

Figure 2 for Global explainability of a deep abstaining classifier

Figure 3 for Global explainability of a deep abstaining classifier

Figure 4 for Global explainability of a deep abstaining classifier

Abstract:We present a global explainability method to characterize sources of errors in the histology prediction task of our real-world multitask convolutional neural network (MTCNN)-based deep abstaining classifier (DAC), for automated annotation of cancer pathology reports from NCI-SEER registries. Our classifier was trained and evaluated on 1.04 million hand-annotated samples and makes simultaneous predictions of cancer site, subsite, histology, laterality, and behavior for each report. The DAC framework enables the model to abstain on ambiguous reports and/or confusing classes to achieve a target accuracy on the retained (non-abstained) samples, but at the cost of decreased coverage. Requiring 97% accuracy on the histology task caused our model to retain only 22% of all samples, mostly the less ambiguous and common classes. Local explainability with the GradInp technique provided a computationally efficient way of obtaining contextual reasoning for thousands of individual predictions. Our method, involving dimensionality reduction of approximately 13000 aggregated local explanations, enabled global identification of sources of errors as hierarchical complexity among classes, label noise, insufficient information, and conflicting evidence. This suggests several strategies such as exclusion criteria, focused annotation, and reduced penalties for errors involving hierarchically related classes to iteratively improve our DAC in this complex real-world implementation.

Via

Access Paper or Ask Questions

An Effective Baseline for Robustness to Distributional Shift

May 15, 2021

Sunil Thulasidasan, Sushil Thapa, Sayera Dhaubhadel, Gopinath Chennupati, Tanmoy Bhattacharya, Jeff Bilmes

Figure 1 for An Effective Baseline for Robustness to Distributional Shift

Figure 2 for An Effective Baseline for Robustness to Distributional Shift

Figure 3 for An Effective Baseline for Robustness to Distributional Shift

Figure 4 for An Effective Baseline for Robustness to Distributional Shift

Abstract:Refraining from confidently predicting when faced with categories of inputs different from those seen during training is an important requirement for the safe deployment of deep learning systems. While simple to state, this has been a particularly challenging problem in deep learning, where models often end up making overconfident predictions in such situations. In this work we present a simple, but highly effective approach to deal with out-of-distribution detection that uses the principle of abstention: when encountering a sample from an unseen class, the desired behavior is to abstain from predicting. Our approach uses a network with an extra abstention class and is trained on a dataset that is augmented with an uncurated set that consists of a large number of out-of-distribution (OoD) samples that are assigned the label of the abstention class; the model is then trained to learn an effective discriminator between in and out-of-distribution samples. We compare this relatively simple approach against a wide variety of more complex methods that have been proposed both for out-of-distribution detection as well as uncertainty modeling in deep learning, and empirically demonstrate its effectiveness on a wide variety of of benchmarks and deep architectures for image recognition and text classification, often outperforming existing approaches by significant margins. Given the simplicity and effectiveness of this method, we propose that this approach be used as a new additional baseline for future work in this domain.

Via

Access Paper or Ask Questions

Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture

Dec 23, 2020

Cristina Garcia-Cardona, M. Giselle Fernández-Godino, Daniel O'Malley, Tanmoy Bhattacharya

Figure 1 for Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture

Figure 2 for Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture

Figure 3 for Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture

Figure 4 for Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture

Abstract:Simulation of the crack network evolution on high strain rate impact experiments performed in brittle materials is very compute-intensive. The cost increases even more if multiple simulations are needed to account for the randomness in crack length, location, and orientation, which is inherently found in real-world materials. Constructing a machine learning emulator can make the process faster by orders of magnitude. There has been little work, however, on assessing the error associated with their predictions. Estimating these errors is imperative for meaningful overall uncertainty quantification. In this work, we extend the heteroscedastic uncertainty estimates to bound a multiple output machine learning emulator. We find that the response prediction is robust with a somewhat conservative estimate of uncertainty.

Via

Access Paper or Ask Questions

Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Sep 24, 2020

Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Kumkum Ganguly, Gopinath Chennupati, Sunil Thulasidasan, Nicolas Hengartner, Brent J. Mumphrey, Eric B. Durban, Jennifer A. Doherty, Mireille Lemieux(+6 more)

Figure 1 for Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Figure 2 for Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Figure 3 for Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Figure 4 for Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Abstract:Safe deployment of deep learning systems in critical real world applications requires models to make few mistakes, and only under predictable circumstances. Development of such a model is not yet possible, in general. In this work, we address this problem with an abstaining classifier tuned to have $>$95% accuracy, and identify the determinants of abstention with LIME (the Local Interpretable Model-agnostic Explanations method). Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate our method in a multitask setting to classify cancer pathology reports from the NCI SEER cancer registries on six tasks of greatest importance. For these tasks, we reduce the classification error rate by factors of 2-5 by abstaining on 25-45% of the reports. For the specific case of cancer site, we are able to identify metastasis and reports involving lymph nodes as responsible for many of the classification mistakes, and that the extent and types of mistakes vary systematically with cancer site (eg. breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks and greater than 85% for all six tasks on the retained samples. By using this information, we expect to define work flows that incorporate machine learning only in the areas where it is sufficiently robust and accurate, saving human attention to areas where it is required.

Via

Access Paper or Ask Questions

On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

May 27, 2019

Sunil Thulasidasan, Gopinath Chennupati, Jeff Bilmes, Tanmoy Bhattacharya, Sarah Michalak

Figure 1 for On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Figure 2 for On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Figure 3 for On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Figure 4 for On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Abstract:Mixup~\cite{zhang2017mixup} is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to implement, it has shown to be a surprisingly effective method of data augmentation for image classification; DNNs trained with mixup show noticeable gains in classification performance on a number of image classification benchmarks. In this work, we discuss a hitherto untouched aspect of mixup training -- the calibration and predictive uncertainty of models trained with mixup. We find that DNNs trained with mixup are significantly better calibrated -- i.e., the predicted softmax scores are much better indicators of the actual likelihood of a correct prediction -- than DNNs trained in the regular fashion. We conduct experiments on a number of image classification architectures and datasets -- including large-scale datasets like ImageNet -- and find this to be the case. Additionally, we find that merely mixing features does not result in the same calibration benefit and that the label smoothing in mixup training plays a significant role in improving calibration. Finally, we also observe that mixup-trained DNNs are less prone to over-confident predictions on out-of-distribution and random-noise data. We conclude that the typical overconfidence seen in neural networks, even on in-distribution data is likely a consequence of training with hard labels, suggesting that mixup training be employed for classification tasks where predictive uncertainty is a significant concern.

Via

Access Paper or Ask Questions

Combating Label Noise in Deep Learning Using Abstention

May 27, 2019

Sunil Thulasidasan, Tanmoy Bhattacharya, Jeff Bilmes, Gopinath Chennupati, Jamal Mohd-Yusof

Figure 1 for Combating Label Noise in Deep Learning Using Abstention

Figure 2 for Combating Label Noise in Deep Learning Using Abstention

Figure 3 for Combating Label Noise in Deep Learning Using Abstention

Figure 4 for Combating Label Noise in Deep Learning Using Abstention

Abstract:We introduce a novel method to combat label noise when training deep neural networks for classification. We propose a loss function that permits abstention during training thereby allowing the DNN to abstain on confusing samples while continuing to learn and improve classification performance on the non-abstained samples. We show how such a deep abstaining classifier (DAC) can be used for robust learning in the presence of different types of label noise. In the case of structured or systematic label noise -- where noisy training labels or confusing examples are correlated with underlying features of the data-- training with abstention enables representation learning for features that are associated with unreliable labels. In the case of unstructured (arbitrary) label noise, abstention during training enables the DAC to be used as an effective data cleaner by identifying samples that are likely to have label noise. We provide analytical results on the loss function behavior that enable dynamic adaption of abstention rates based on learning progress during training. We demonstrate the utility of the deep abstaining classifier for various image classification tasks under different types of label noise; in the case of arbitrary label noise, we show significant improvements over previously published results on multiple image benchmarks.

* ICML 2019

Via

Access Paper or Ask Questions

On the universal structure of human lexical semantics

Apr 29, 2015

Hyejin Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft, Tanmoy Bhattacharya

Figure 1 for On the universal structure of human lexical semantics

Figure 2 for On the universal structure of human lexical semantics

Figure 3 for On the universal structure of human lexical semantics

Figure 4 for On the universal structure of human lexical semantics

Abstract:How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single polysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analysis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use.

* PNAS 113 7 1766-1771 (2016)
* Press embargo in place until publication

Via

Access Paper or Ask Questions