Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

François de la Bourdonnaye

Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Jun 27, 2022

François de la Bourdonnaye, Fabrice Daniel

Figure 1 for Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Figure 2 for Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Figure 3 for Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Figure 4 for Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Abstract:Various problems of any credit card fraud detection based on machine learning come from the imbalanced aspect of transaction datasets. Indeed, the number of frauds compared to the number of regular transactions is tiny and has been shown to damage learning performances, e.g., at worst, the algorithm can learn to classify all the transactions as regular. Resampling methods and cost-sensitive approaches are known to be good candidates to leverage this issue of imbalanced datasets. This paper evaluates numerous state-of-the-art resampling methods on a large real-life online credit card payments dataset. We show they are inefficient because methods are intractable or because metrics do not exhibit substantial improvements. Our work contributes to this domain in (1) that we compare many state-of-the-art resampling methods on a large-scale dataset and in (2) that we use a real-life online credit card payments dataset.

Via

Access Paper or Ask Questions

Evaluating categorical encoding methods on a real credit card fraud detection database

Dec 22, 2021

François de la Bourdonnaye, Fabrice Daniel

Figure 1 for Evaluating categorical encoding methods on a real credit card fraud detection database

Figure 2 for Evaluating categorical encoding methods on a real credit card fraud detection database

Figure 3 for Evaluating categorical encoding methods on a real credit card fraud detection database

Figure 4 for Evaluating categorical encoding methods on a real credit card fraud detection database

Abstract:Correctly dealing with categorical data in a supervised learning context is still a major issue. Furthermore, though some machine learning methods embody builtin methods to deal with categorical features, it is unclear whether they bring some improvements and how do they compare with usual categorical encoding methods. In this paper, we describe several well-known categorical encoding methods that are based on target statistics and weight of evidence. We apply them on a large and real credit card fraud detection database. Then, we train the encoded databases using state-of-the-art gradient boosting methods and evaluate their performances. We show that categorical encoding methods generally bring substantial improvements with respect to the absence of encoding. The contribution of this work is twofold: (1) we compare many state-of-the-art "lite" categorical encoding methods on a large scale database and (2) we use a real credit card fraud detection database.

Via

Access Paper or Ask Questions