Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fernando Jiménez

Permutation-based multi-objective evolutionary feature selection for high-dimensional data

Jan 24, 2025

Raquel Espinosa, Gracia Sánchez, José Palma, Fernando Jiménez

Abstract:Feature selection is a critical step in the analysis of high-dimensional data, where the number of features often vastly exceeds the number of samples. Effective feature selection not only improves model performance and interpretability but also reduces computational costs and mitigates the risk of overfitting. In this context, we propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach, but extending it to evaluate subsets of attributes rather than individual features. This extension more effectively captures how interactions among features influence model performance. The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets, with the objectives of maximizing the degradation in model performance when the selected features are shuffled, and minimizing the cardinality of the feature subset. The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets for classification and regression tasks, and compared against 9 well-established feature selection methods designed for high-dimensional problems, including the conventional permutation feature importance method. The results demonstrate the ability of our approach in balancing accuracy and computational efficiency, providing a powerful tool for feature selection in complex, high-dimensional datasets.

Via

Access Paper or Ask Questions

Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting

Dec 29, 2023

Raquel Espinosa, Fernando Jiménez, José Palma

Abstract:Time series forecasting plays a crucial role in diverse fields, necessitating the development of robust models that can effectively handle complex temporal patterns. In this article, we present a novel feature selection method embedded in Long Short-Term Memory networks, leveraging a multi-objective evolutionary algorithm. Our approach optimizes the weights and biases of the LSTM in a partitioned manner, with each objective function of the evolutionary algorithm targeting the root mean square error in a specific data partition. The set of non-dominated forecast models identified by the algorithm is then utilized to construct a meta-model through stacking-based ensemble learning. Furthermore, our proposed method provides an avenue for attribute importance determination, as the frequency of selection for each attribute in the set of non-dominated forecasting models reflects their significance. This attribute importance insight adds an interpretable dimension to the forecasting process. Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the generalization ability of conventional LSTMs, effectively reducing overfitting. Comparative analyses against state-of-the-art CancelOut and EAR-FS methods highlight the superior performance of our approach.

Via

Access Paper or Ask Questions

Multivariate feature ranking of gene expression data

Nov 16, 2021

Fernando Jiménez, Gracia Sánchez, José Palma, Luis Miralles-Pechuán, Juan Botía

Figure 1 for Multivariate feature ranking of gene expression data

Figure 2 for Multivariate feature ranking of gene expression data

Figure 3 for Multivariate feature ranking of gene expression data

Figure 4 for Multivariate feature ranking of gene expression data

Abstract:Gene expression datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods tend to be not applicable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, so they do not detect interactions between factors. In this paper we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which we have applied in three gene expression classification problems. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance, as well as feature selection methods of attribute subset evaluation based on correlation and consistency with multi-objective evolutionary search strategy.

Via

Access Paper or Ask Questions

A novel auction system for selecting advertisements in Real-Time bidding

Oct 22, 2020

Luis Miralles-Pechuán, Fernando Jiménez, José Manuel García

Figure 1 for A novel auction system for selecting advertisements in Real-Time bidding

Figure 2 for A novel auction system for selecting advertisements in Real-Time bidding

Figure 3 for A novel auction system for selecting advertisements in Real-Time bidding

Figure 4 for A novel auction system for selecting advertisements in Real-Time bidding

Abstract:Real-Time Bidding is a new Internet advertising system that has become very popular in recent years. This system works like a global auction where advertisers bid to display their impressions in the publishers' ad slots. The most popular system to select which advertiser wins each auction is the Generalized second-price auction in which the advertiser that offers the most wins the bet and is charged with the price of the second largest bet. In this paper, we propose an alternative betting system with a new approach that not only considers the economic aspect but also other relevant factors for the functioning of the advertising system. The factors that we consider are, among others, the benefit that can be given to each advertiser, the probability of conversion from the advertisement, the probability that the visit is fraudulent, how balanced are the networks participating in RTB and if the advertisers are not paying over the market price. In addition, we propose a methodology based on genetic algorithms to optimize the selection of each advertiser. We also conducted some experiments to compare the performance of the proposed model with the famous Generalized Second-Price method. We think that this new approach, which considers more relevant aspects besides the price, offers greater benefits for RTB networks in the medium and long-term.

Via

Access Paper or Ask Questions

A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

May 15, 2020

Luis Miralles-Pechuán, Fernando Jiménez, Hiram Ponce, Lourdes Martínez-Villaseñor

Figure 1 for A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

Figure 2 for A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

Figure 3 for A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

Figure 4 for A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

Abstract:Whenever countries are threatened by a pandemic, as is the case with the COVID-19 virus, governments should take the right actions to safeguard public health as well as to mitigate the negative effects on the economy. In this regard, there are two completely different approaches governments can take: a restrictive one, in which drastic measures such as self-isolation can seriously damage the economy, and a more liberal one, where more relaxed restrictions may put at risk a high percentage of the population. The optimal approach could be somewhere in between, and, in order to make the right decisions, it is necessary to accurately estimate the future effects of taking one or other measures. In this paper, we use the SEIR epidemiological model (Susceptible - Exposed - Infected - Recovered) for infectious diseases to represent the evolution of the virus COVID-19 over time in the population. To optimize the best sequences of actions governments can take, we propose a methodology with two approaches, one based on Deep Q-Learning and another one based on Genetic Algorithms. The sequences of actions (confinement, self-isolation, two-meter distance or not taking restrictions) are evaluated according to a reward system focused on meeting two objectives: firstly, getting few people infected so that hospitals are not overwhelmed with critical patients, and secondly, avoiding taking drastic measures for too long which can potentially cause serious damage to the economy. The conducted experiments prove that our methodology is a valid tool to discover actions governments can take to reduce the negative effects of a pandemic in both senses. We also prove that the approach based on Deep Q-Learning overcomes the one based on Genetic Algorithms for optimizing the sequences of actions.

Via

Access Paper or Ask Questions