Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alberto Carlevaro

Exact characterization of ε-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation

Jan 29, 2025

Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli

Abstract:Probabilistic guarantees on the prediction of data-driven classifiers are necessary to define models that can be considered reliable. This is a key requirement for modern machine learning in which the goodness of a system is measured in terms of trustworthiness, clearly dividing what is safe from what is unsafe. The spirit of this paper is exactly in this direction. First, we introduce a formal definition of {\epsilon}-Safe Decision Region, a subset of the input space in which the prediction of a target (safe) class is probabilistically guaranteed. Second, we prove that, when data come from exponential family distributions, the form of such a region is analytically determined and controllable by design parameters, i.e. the probability of sampling the target class and the confidence on the prediction. However, the request of having exponential data is not always possible. Inspired by this limitation, we developed Multi Cost SVM, an SVM based algorithm that approximates the safe region and is also able to handle unbalanced data. The research is complemented by experiments and code available for reproducibility.

Via

Access Paper or Ask Questions

Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Mar 15, 2024

Alberto Carlevaro, Teodoro Alamo Cantarero, Fabrizio Dabbene, Maurizio Mongelli

Figure 1 for Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Figure 2 for Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Figure 3 for Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Figure 4 for Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Abstract:Conformal predictions make it possible to define reliable and robust learning algorithms. But they are essentially a method for evaluating whether an algorithm is good enough to be used in practice. To define a reliable learning framework for classification from the very beginning of its design, the concept of scalable classifier was introduced to generalize the concept of classical classifier by linking it to statistical order theory and probabilistic learning theory. In this paper, we analyze the similarities between scalable classifiers and conformal predictions by introducing a new definition of a score function and defining a special set of input variables, the conformal safety set, which can identify patterns in the input space that satisfy the error coverage guarantee, i.e., that the probability of observing the wrong (possibly unsafe) label for points belonging to this set is bounded by a predefined $\varepsilon$ error level. We demonstrate the practical implications of this framework through an application in cybersecurity for identifying DNS tunneling attacks. Our work contributes to the development of probabilistically robust and reliable machine learning models.

* 19 pages, 6 figures, journal paper

Via

Access Paper or Ask Questions

Probabilistic Safety Regions Via Finite Families of Scalable Classifiers

Sep 08, 2023

Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli

Abstract:Supervised classification recognizes patterns in the data to separate classes of behaviours. Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning. The data analyst may minimize the classification error on a class at the expense of increasing the error of the other classes. The error control of such a design phase is often done in a heuristic manner. In this context, it is key to develop theoretical foundations capable of providing probabilistic certifications to the obtained classifiers. In this perspective, we introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled. The notion of scalable classifiers is then exploited to link the tuning of machine learning with error control. Several tests corroborate the approach. They are provided through synthetic data in order to highlight all the steps involved, as well as through a smart mobility application.

* 13 pages, 4 figures, 1 table, submitted to IEEE TNNLS

Via

Access Paper or Ask Questions

CONFIDERAI: a novel CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence

Sep 06, 2023

Alberto Carlevaro, Sara Narteni, Fabrizio Dabbene, Marco Muselli, Maurizio Mongelli

Abstract:Everyday life is increasingly influenced by artificial intelligence, and there is no question that machine learning algorithms must be designed to be reliable and trustworthy for everyone. Specifically, computer scientists consider an artificial intelligence system safe and trustworthy if it fulfills five pillars: explainability, robustness, transparency, fairness, and privacy. In addition to these five, we propose a sixth fundamental aspect: conformity, that is, the probabilistic assurance that the system will behave as the machine learner expects. In this paper, we propose a methodology to link conformal prediction with explainable machine learning by defining CONFIDERAI, a new score function for rule-based models that leverages both rules predictive ability and points geometrical position within rules boundaries. We also address the problem of defining regions in the feature space where conformal guarantees are satisfied by exploiting techniques to control the number of non-conformal samples in conformal regions based on support vector data description (SVDD). The overall methodology is tested with promising results on benchmark and real datasets, such as DNS tunneling detection or cardiovascular disease prediction.

* 12 pages, 7 figures, 1 algorithm, international journal

Via

Access Paper or Ask Questions