Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Elder

Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Oct 04, 2023

Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri

Figure 1 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 2 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 3 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 4 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Abstract:Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

* Published at Workshop on Distribution-Free Uncertainty Quantification, International Conference on Machine Learning (ICML), July 2022. arXiv admin note: substantial text overlap with arXiv:2106.00858

Via

Access Paper or Ask Questions

Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Jun 01, 2021

Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri

Figure 1 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 2 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 3 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 4 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Abstract:Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to a specific operating point, making evaluation and comparison across different studies difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a simple reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper describes the corresponding algorithm, provides a theoretical analysis, and demonstrates its utility in multiple scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

* 10 pages main paper, 9 pages appendix

Via

Access Paper or Ask Questions

Learning Prediction Intervals for Model Performance

Dec 15, 2020

Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil

Figure 1 for Learning Prediction Intervals for Model Performance

Figure 2 for Learning Prediction Intervals for Model Performance

Figure 3 for Learning Prediction Intervals for Model Performance

Figure 4 for Learning Prediction Intervals for Model Performance

Abstract:Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this burden, but potential inaccuracy and a lack of trust in their predictions has prevented their widespread adoption. We address this core problem of performance prediction uncertainty with a method to compute prediction intervals for model performance. Our methodology uses transfer learning to train an uncertainty model to estimate the uncertainty of model performance predictions. We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines. We believe this result makes prediction intervals, and performance prediction in general, significantly more practical for real-world use.

* 7+6 pages, 5 figures, AAAI 2021

Via

Access Paper or Ask Questions

Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Jul 10, 2020

Begum Taskazan, Jiri Navratil, Matthew Arnold, Anupama Murthi, Ganesh Venkataraman, Benjamin Elder

Figure 1 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 2 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 3 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 4 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Abstract:Building and maintaining high-quality test sets remains a laborious and expensive task. As a result, test sets in the real world are often not properly kept up to date and drift from the production traffic they are supposed to represent. The frequency and severity of this drift raises serious concerns over the value of manually labeled test sets in the QA process. This paper proposes a simple but effective technique that drastically reduces the effort needed to construct and maintain a high-quality test set (reducing labeling effort by 80-100% across a range of practical scenarios). This result encourages a fundamental rethinking of the testing process by both practitioners, who can use these techniques immediately to improve their testing, and researchers who can help address many of the open questions raised by this new approach.

* International Workshop on Challenges in Deploying and Monitoring Machine Learning Systems in Conjunction with ICML 2020

Via

Access Paper or Ask Questions

Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Jul 02, 2020

Jiri Navratil, Matthew Arnold, Benjamin Elder

Figure 1 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 2 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 3 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 4 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Abstract:Generating high quality uncertainty estimates for sequential regression, particularly deep recurrent networks, remains a challenging and open problem. Existing approaches often make restrictive assumptions (such as stationarity) yet still perform poorly in practice, particularly in presence of real world non-stationary signals and drift. This paper describes a flexible method that can generate symmetric and asymmetric uncertainty estimates, makes no assumptions about stationarity, and outperforms competitive baselines on both drift and non drift scenarios. This work helps make sequential regression more effective and practical for use in real-world applications, and is a powerful new addition to the modeling toolbox for sequential uncertainty quantification in general.

* 8 pages main paper + 11 pages appendix/references; 10 figures

Via

Access Paper or Ask Questions

Towards Automating the AI Operations Lifecycle

Mar 28, 2020

Matthew Arnold, Jeffrey Boston, Michael Desmond, Evelyn Duesterwald, Benjamin Elder, Anupama Murthi, Jiri Navratil, Darrell Reimer

Figure 1 for Towards Automating the AI Operations Lifecycle

Abstract:Today's AI deployments often require significant human involvement and skill in the operational stages of the model lifecycle, including pre-release testing, monitoring, problem diagnosis and model improvements. We present a set of enabling technologies that can be used to increase the level of automation in AI operations, thus lowering the human effort required. Since a common source of human involvement is the need to assess the performance of deployed models, we focus on technologies for performance prediction and KPI analysis and show how they can be used to improve automation in the key stages of a typical AI operations pipeline.

Via

Access Paper or Ask Questions

NeuNetS: An Automated Synthesis Engine for Neural Network Design

Jan 17, 2019

Atin Sood, Benjamin Elder, Benjamin Herta, Chao Xue, Costas Bekas, A. Cristiano I. Malossi, Debashish Saha, Florian Scheidegger, Ganesh Venkataraman, Gegi Thomas(+10 more)

Figure 1 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 2 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 3 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 4 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Abstract:Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM's AI OpenScale's product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models.

* 14 pages, 12 figures. arXiv admin note: text overlap with arXiv:1806.00250

Via

Access Paper or Ask Questions