Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Davide Capuzzo

Enabling Synthetic Data adoption in regulated domains

Apr 13, 2022

Giorgio Visani, Giacomo Graffi, Mattia Alfero, Enrico Bagli, Davide Capuzzo, Federico Chesani

Figure 1 for Enabling Synthetic Data adoption in regulated domains

Figure 2 for Enabling Synthetic Data adoption in regulated domains

Figure 3 for Enabling Synthetic Data adoption in regulated domains

Figure 4 for Enabling Synthetic Data adoption in regulated domains

Abstract:The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.

Via

Access Paper or Ask Questions

Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes

Dec 30, 2020

Giorgio Visani, Federico Chesani, Enrico Bagli, Davide Capuzzo, Alessandro Poluzzi

Figure 1 for Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes

Figure 2 for Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes

Figure 3 for Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes

Figure 4 for Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes

Abstract:In the global economy, credit companies play a central role in economic development, through their activity as money lenders. This important task comes with some drawbacks, mainly the risk of the debtors not being able to repay the provided credit. Therefore, Credit Risk Modelling (CRM), namely the evaluation of the probability that a debtor will not repay the due amount, plays a paramount role. Statistical approaches have been successfully exploited since long, becoming the most used methods for CRM. Recently, also machine and deep learning techniques have been applied to the CRM task, showing an important increase in prediction quality and performances. However, such techniques usually do not provide reliable explanations for the scores they come up with. As a consequence, many machine and deep learning techniques fail to comply with western countries' regulations such as, for example, GDPR. In this paper we suggest to use LIME (Local Interpretable Model-agnostic Explanations) technique to tackle the explainability problem in this field, we show its employment on a real credit-risk dataset and eventually discuss its soundness and the necessary improvements to guarantee its adoption and compliance with the task.

Via

Access Paper or Ask Questions

Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Jan 31, 2020

Giorgio Visani, Enrico Bagli, Federico Chesani, Alessandro Poluzzi, Davide Capuzzo

Figure 1 for Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Figure 2 for Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Figure 3 for Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Figure 4 for Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Abstract:Nowadays we are witnessing a transformation of the business processes towards a more computation driven approach. The ever increasing usage of Machine Learning techniques is the clearest example of such trend. This sort of revolution is often providing advantages, such as an increase in prediction accuracy and a reduced time to obtain the results. However, these methods present a major drawback: it is very difficult to understand on what grounds the algorithm took the decision. To address this issue we consider the LIME method. We give a general background on LIME then, we focus on the stability issue: employing the method repeated times, under the same conditions, may yield to different explanations. Two complementary indices are proposed, to measure LIME stability. It is important for the practitioner to be aware of the issue, as well as to have a tool for spotting it. Stability guarantees LIME explanations to be reliable, therefore a stability assessment, made through the proposed indices, is crucial. As a case study, we apply both Machine Learning and classical statistical techniques to Credit Risk data. We test LIME on the Machine Learning algorithm and check its stability. Eventually, we examine the goodness of the explanations returned.

Via

Access Paper or Ask Questions