Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miguel Almeida

Understanding the double descent curve in Machine Learning

Nov 18, 2022

Luis Sa-Couto, Jose Miguel Ramos, Miguel Almeida, Andreas Wichert

Abstract:The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This led to the proposal of the double descent curve of performance by Belkin et al. Although it seems to describe a real, representative phenomenon, the field is lacking a fundamental theoretical understanding of what is happening, what are the consequences for model selection and when is double descent expected to occur. In this paper we develop a principled understanding of the phenomenon, and sketch answers to these important questions. Furthermore, we report real experimental results that are correctly predicted by our proposed hypothesis.

Via

Access Paper or Ask Questions

BreachRadar: Automatic Detection of Points-of-Compromise

Sep 24, 2020

Miguel Araujo, Miguel Almeida, Jaime Ferreira, Luis Silva, Pedro Bizarro

Figure 1 for BreachRadar: Automatic Detection of Points-of-Compromise

Figure 2 for BreachRadar: Automatic Detection of Points-of-Compromise

Figure 3 for BreachRadar: Automatic Detection of Points-of-Compromise

Figure 4 for BreachRadar: Automatic Detection of Points-of-Compromise

Abstract:Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure. BreachRadar is a distributed alternating algorithm that assigns a probability of being compromised to the different possible locations. We implement this method using Apache Spark and show its linear scalability in the number of machines and transactions. BreachRadar is applied to two datasets with billions of real transaction records and fraud labels where we provide multiple examples of real Points-of-Compromise we are able to detect. We further show the effectiveness of our method when injecting Points-of-Compromise in one of these datasets, simultaneously achieving over 90% precision and recall when only 10% of the cards have been victims of fraud.

* 9 pages, 10 figures, published in SIAM's 2017 International Conference on Data Mining (SDM17)

Via

Access Paper or Ask Questions

Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

Jun 13, 2011

Miguel Almeida, Jan-Hendrik Schleimer, José Bioucas-Dias, Ricardo Vigário

Figure 1 for Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

Figure 2 for Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

Figure 3 for Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

Figure 4 for Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

Abstract:Due to space limitations, our submission "Source Separation and Clustering of Phase-Locked Subspaces", accepted for publication on the IEEE Transactions on Neural Networks in 2011, presented some results without proof. Those proofs are provided in this paper.

Via

Access Paper or Ask Questions