Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liyun He-Guelton

Transfer Learning for Credit Card Fraud Detection: A Journey from Research to Production

Jul 20, 2021

Wissam Siblini, Guillaume Coter, Rémy Fabry, Liyun He-Guelton, Frédéric Oblé, Bertrand Lebichot, Yann-Aël Le Borgne, Gianluca Bontempi

Figure 1 for Transfer Learning for Credit Card Fraud Detection: A Journey from Research to Production

Abstract:The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state of the art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business.

Via

Access Paper or Ask Questions

Master your Metrics with Calibration

Sep 06, 2019

Wissam Siblini, Jordan Fréry, Liyun He-Guelton, Frédéric Oblé, Yi-Qing Wang

Figure 1 for Master your Metrics with Calibration

Figure 2 for Master your Metrics with Calibration

Figure 3 for Master your Metrics with Calibration

Figure 4 for Master your Metrics with Calibration

Abstract:Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics may sometimes lead to wrong conclusions about the performance. For example, when dealing with non-stationary data streams, they do not allow the user to discern the reasons why a model performance varies across different periods. In this paper, we propose a way to calibrate the metrics so that they are no longer tied to the class prior. It corresponds to a readjustment, based on probabilities, to the value that the metric would have if the class prior was equal to a reference prior (user parameter). We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.

Via

Access Paper or Ask Questions

Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Sep 03, 2019

Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Liyun He-Guelton, Olivier Caelen, Michael Granitzer, Sylvie Calabretto

Figure 1 for Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Figure 2 for Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Figure 3 for Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Figure 4 for Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Abstract:Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn't contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. Our multiple perspectives HMM-based approach offers automated feature engineering to model temporal correlations so as to improve the effectiveness of the classification task and allows for an increase in the detection of fraudulent transactions when combined with the state of the art expert based feature engineering strategy for credit card fraud detection. In extension to previous works, we show that this approach goes beyond ecommerce transactions and provides a robust feature engineering over different datasets, hyperparameters and classifiers. Moreover, we compare strategies to deal with structural missing values.

* published in the journal "future generation computer systems", in the special issue: "data exploration in the web 3.0 age"

Via

Access Paper or Ask Questions

Dataset shift quantification for credit card fraud detection

Jun 17, 2019

Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Sylvie Calabretto, Liyun He-Guelton, Frederic Oblé, Michael Granitzer

Figure 1 for Dataset shift quantification for credit card fraud detection

Figure 2 for Dataset shift quantification for credit card fraud detection

Figure 3 for Dataset shift quantification for credit card fraud detection

Figure 4 for Dataset shift quantification for credit card fraud detection

Abstract:Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection.

* Presented at IEEE Artificial Intelligence and Knowledge Engineering (AIKE 2019)

Via

Access Paper or Ask Questions

Multiple perspectives HMM-based feature engineering for credit card fraud detection

May 15, 2019

Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Olivier Caelen, Liyun He-Guelton, Sylvie Calabretto, Michael Granitzer

Figure 1 for Multiple perspectives HMM-based feature engineering for credit card fraud detection

Figure 2 for Multiple perspectives HMM-based feature engineering for credit card fraud detection

Abstract:Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtained by fixing the card-holder or the payment terminal? (iii) Is it a sequence of spent amount or of elapsed time between the current and previous transactions? Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sets is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. This multiple perspectives HMM-based approach enables an automatic feature engineering in order to model the sequential properties of the dataset with respect to the classification task. This strategy allows for a 15% increase in the precision-recall AUC compared to the state of the art feature engineering strategy for credit card fraud detection.

* Presented as a poster in the conference SAC 2019: 34th ACM/SIGAPP Symposium on Applied Computing in April 2019

Via

Access Paper or Ask Questions