Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Narek Davtyan

LoRAS: An oversampling approach for imbalanced datasets

Aug 23, 2019

Saptarshi Bej, Narek Davtyan, Markus Wolfien, Mariam Nassar, Olaf Wolkenhauer

Figure 1 for LoRAS: An oversampling approach for imbalanced datasets

Figure 2 for LoRAS: An oversampling approach for imbalanced datasets

Figure 3 for LoRAS: An oversampling approach for imbalanced datasets

Figure 4 for LoRAS: An oversampling approach for imbalanced datasets

Abstract:The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our LoRAS algorithm with 28 publicly available datasets and show that that drawing samples from an approximated data manifold of the minority class is the key to successful oversampling. We compared the performance of LoRAS, SMOTE, and several SMOTE extensions and observed that for imbalanced datasets LoRAS, on average generates better Machine Learning (ML) models in terms of F1-score and Balanced Accuracy. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS is a more effective oversampling technique since it provides a better estimate to mean of the underlying local data distribution of the minority class data space.

* 16 pages, 1 figure (4 subfigures)

Via

Access Paper or Ask Questions