Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Carlos Sevilla-Salcedo

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

Nov 11, 2024

Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Jussi Tohka, Vanessa Gómez-Verdejo

Abstract:We present BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

* 36 pages, 3 figures and 3 tables

Via

Access Paper or Ask Questions

The Relevance Feature and Vector Machine for health applications

Feb 11, 2024

Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Emilio Parrado-Hernández, Vanessa Gómez-Verdejo

Figure 1 for The Relevance Feature and Vector Machine for health applications

Figure 2 for The Relevance Feature and Vector Machine for health applications

Figure 3 for The Relevance Feature and Vector Machine for health applications

Figure 4 for The Relevance Feature and Vector Machine for health applications

Abstract:This paper presents the Relevance Feature and Vector Machine (RFVM), a novel model that addresses the challenges of the fat-data problem when dealing with clinical prospective studies. The fat-data problem refers to the limitations of Machine Learning (ML) algorithms when working with databases in which the number of features is much larger than the number of samples (a common scenario in certain medical fields). To overcome such limitations, the RFVM incorporates different characteristics: (1) A Bayesian formulation which enables the model to infer its parameters without overfitting thanks to the Bayesian model averaging. (2) A joint optimisation that overcomes the limitations arising from the fat-data characteristic by simultaneously including the variables that define the primal space (features) and those that define the dual space (observations). (3) An integrated prunning that removes the irrelevant features and samples during the training iterative optimization. Also, this last point turns out crucial when performing medical prospective studies, enabling researchers to exclude unnecessary medical tests, reducing costs and inconvenience for patients, and identifying the critical patients/subjects that characterize the disorder and, subsequently, optimize the patient recruitment process that leads to a balanced cohort. The model capabilities are tested against state-of-the-art models in several medical datasets with fat-data problems. These experimental works show that RFVM is capable of achieving competitive classification accuracies while providing the most compact subset of data (in both terms of features and samples). Moreover, the selected features (medical tests) seem to be aligned with the existing medical literature.

* 19 pages of main text, 12 pages of appendices, 2 figures and 5 tables

Via

Access Paper or Ask Questions

Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Jan 12, 2024

Vandad Imani, Elaheh Moradi, Carlos Sevilla-Salcedo, Vittorio Fortino, Jussi Tohka

Figure 1 for Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Figure 2 for Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Figure 3 for Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Figure 4 for Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Abstract:Feature selection in noisy label scenarios remains an understudied topic. We propose a novel genetic algorithm-based approach, the Noise-Aware Multi-Objective Feature Selection Genetic Algorithm (NMFS-GA), for selecting optimal feature subsets in binary classification with noisy labels. NMFS-GA offers a unified framework for selecting feature subsets that are both accurate and interpretable. We evaluate NMFS-GA on synthetic datasets with label noise, a Breast Cancer dataset enriched with noisy features, and a real-world ADNI dataset for dementia conversion prediction. Our results indicate that NMFS-GA can effectively select feature subsets that improve the accuracy and interpretability of binary classifiers in scenarios with noisy labels.

Via

Access Paper or Ask Questions

Multi-Objective Genetic Algorithm for Multi-View Feature Selection

May 26, 2023

Vandad Imani, Carlos Sevilla-Salcedo, Vittorio Fortino, Jussi Tohka

Abstract:Multi-view datasets offer diverse forms of data that can enhance prediction models by providing complementary information. However, the use of multi-view data leads to an increase in high-dimensional data, which poses significant challenges for the prediction models that can lead to poor generalization. Therefore, relevant feature selection from multi-view datasets is important as it not only addresses the poor generalization but also enhances the interpretability of the models. Despite the success of traditional feature selection methods, they have limitations in leveraging intrinsic information across modalities, lacking generalizability, and being tailored to specific classification tasks. We propose a novel genetic algorithm strategy to overcome these limitations of traditional feature selection methods for multi-view data. Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views under a unified framework. The MMFS-GA framework demonstrates superior performance and interpretability for feature selection on multi-view datasets in both binary and multiclass classification tasks. The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods. This work provides a promising solution for multi-view feature selection and opens up new possibilities for further research in multi-view datasets.

Via

Access Paper or Ask Questions

Bayesian learning of feature spaces for multitasks problems

Sep 07, 2022

Carlos Sevilla-Salcedo, Ascensión Gallardo-Antolín, Vanessa Gómez-Verdejo, Emilio Parrado-Hernández

Figure 1 for Bayesian learning of feature spaces for multitasks problems

Figure 2 for Bayesian learning of feature spaces for multitasks problems

Figure 3 for Bayesian learning of feature spaces for multitasks problems

Figure 4 for Bayesian learning of feature spaces for multitasks problems

Abstract:This paper presents a Bayesian framework to construct non-linear, parsimonious, shallow models for multitask regression. The proposed framework relies on the fact that Random Fourier Features (RFFs) enables the approximation of an RBF kernel by an extreme learning machine whose hidden layer is formed by RFFs. The main idea is to combine both dual views of a same model under a single Bayesian formulation that extends the Sparse Bayesian Extreme Learning Machines to multitask problems. From the kernel methods point of view, the proposed formulation facilitates the introduction of prior domain knowledge through the RBF kernel parameter. From the extreme learning machines perspective, the new formulation helps control overfitting and enables a parsimonious overall model (the models that serve each task share a same set of RFFs selected within the joint Bayesian optimisation). The experimental results show that combining advantages from kernel methods and extreme learning machines within the same framework can lead to significant improvements in the performance achieved by each of these two paradigms independently.

Via

Access Paper or Ask Questions

Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Jul 19, 2022

Alejandro Guerrero-López, Carlos Sevilla-Salcedo, Vanessa Gómez-Verdejo, Pablo M. Olmos

Figure 1 for Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Figure 2 for Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Figure 3 for Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Figure 4 for Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Abstract:Real-world databases are complex, they usually present redundancy and shared correlations between heterogeneous and multiple representations of the same data. Thus, exploiting and disentangling shared information between views is critical. For this purpose, recent studies often fuse all views into a shared nonlinear complex latent space but they lose the interpretability. To overcome this limitation, here we propose a novel method to combine multiple Variational AutoEncoders (VAE) architectures with a Factor Analysis latent space (FA-VAE). Concretely, we use a VAE to learn a private representation of each heterogeneous view in a continuous latent space. Then, we model the shared latent space by projecting every private variable to a low-dimensional latent space using a linear projection matrix. Thus, we create an interpretable hierarchical dependency between private and shared information. This way, the novel model is able to simultaneously: (i) learn from multiple heterogeneous views, (ii) obtain an interpretable hierarchical shared space, and, (iii) perform transfer learning between generative models.

* 20 pages main work, 2 pages supplementary, 14 figures

Via

Access Paper or Ask Questions

Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Jan 13, 2022

Carlos Sevilla-Salcedo, Vandad Imani, Pablo M. Olmos, Vanessa Gómez-Verdejo, Jussi Tohka

Figure 1 for Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Figure 2 for Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Figure 3 for Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Figure 4 for Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Abstract:Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to impute missing values and combine information of several views. This way, we can combine different data-views from different time-points in a common latent space and learn the relations between each time-point while simultaneously modelling and predicting several output variables. We apply this model to predict together diagnosis, ventricle volume, and clinical scores in dementia. The results demonstrate that SSHIBA is capable of learning a good imputation of the missing values and outperforming the baselines while simultaneously predicting three different tasks.

Via

Access Paper or Ask Questions

Bayesian Sparse Factor Analysis with Kernelized Observations

Jun 10, 2020

Carlos Sevilla-Salcedo, Alejandro Guerrero-López, Pablo M. Olmos, Vanessa Gómez-Verdejo

Figure 1 for Bayesian Sparse Factor Analysis with Kernelized Observations

Figure 2 for Bayesian Sparse Factor Analysis with Kernelized Observations

Figure 3 for Bayesian Sparse Factor Analysis with Kernelized Observations

Figure 4 for Bayesian Sparse Factor Analysis with Kernelized Observations

Abstract:Latent variable models for multi-view learning attempt to find low-dimensional projections that fairly capture the correlations among multiple views that characterise each datum. High-dimensional views in medium-sized datasets and non-linear problems are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. To overcome these limitations, instead of imposing a kernel function, here we propose an alternative method. In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its correlation with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations (based in a automatic selection of bayesian support vectors) and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination gaussian processes latent variable models.

* Article submitted to NeurIPS 2020

Via

Access Paper or Ask Questions

Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Jan 24, 2020

Carlos Sevilla-Salcedo, Vanessa Gómez-Verdejo, Pablo M. Olmos

Figure 1 for Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Figure 2 for Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Figure 3 for Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Figure 4 for Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Abstract:The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space. The objective of this article is to propose a general FA framework capable of modelling any problem. To do so, we start from the Bayesian Inter-Battery Factor Analysis (BIBFA) model, enhancing it with new functionalities to be able to work with heterogeneous data, include feature selection, and handle missing values as well as semi-supervised problems. The performance of the proposed model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis (SSHIBA) has been tested on 4 different scenarios to evaluate each one of its novelties, showing not only a great versatility and an interpretability gain, but also outperforming most of the state-of-the-art algorithms.

Via

Access Paper or Ask Questions