Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rita T. Sousa

Multi-dataset and Transfer Learning Using Gene Expression Knowledge Graphs

Mar 26, 2025

Rita T. Sousa, Heiko Paulheim

Abstract:Gene expression datasets offer insights into gene regulation mechanisms, biochemical pathways, and cellular functions. Additionally, comparing gene expression profiles between disease and control patients can deepen the understanding of disease pathology. Therefore, machine learning has been used to process gene expression data, with patient diagnosis emerging as one of the most popular applications. Although gene expression data can provide valuable insights, challenges arise because the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel methodology to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. Then, vector representations are produced using knowledge graph embedding techniques, which are used as inputs for a graph neural network and a multi-layer perceptron. We evaluate the efficacy of our methodology in three settings: single-dataset learning, multi-dataset learning, and transfer learning. The experimental results show that combining gene expression datasets and domain-specific knowledge improves patient diagnosis in all three settings.

* Accepted at the Extended Semantic Web Conference 2025

Via

Access Paper or Ask Questions

A Prompt Engineering Approach and a Knowledge Graph based Framework for Tackling Legal Implications of Large Language Model Answers

Oct 19, 2024

George Hannah, Rita T. Sousa, Ioannis Dasoulas, Claudia d'Amato

Abstract:With the recent surge in popularity of Large Language Models (LLMs), there is the rising risk of users blindly trusting the information in the response, even in cases where the LLM recommends actions that have potential legal implications and this may put the user in danger. We provide an empirical analysis on multiple existing LLMs showing the urgency of the problem. Hence, we propose a short-term solution consisting in an approach for isolating these legal issues through prompt re-engineering. We further analyse the outcomes but also the limitations of the prompt engineering based approach and we highlight the need of additional resources for fully solving the problem We also propose a framework powered by a legal knowledge graph (KG) to generate legal citations for these legal issues, enriching the response of the LLM.

* 27 pages, 2 figures

Via

Access Paper or Ask Questions

Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

Apr 23, 2024

Rita T. Sousa, Heiko Paulheim

Figure 1 for Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

Figure 2 for Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

Figure 3 for Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

Figure 4 for Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

Abstract:Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.

* 11 pages, 4 figures, 7th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics at ESWC2024

Via

Access Paper or Ask Questions

Biomedical Knowledge Graph Embeddings with Negative Statements

Aug 07, 2023

Rita T. Sousa, Sara Silva, Heiko Paulheim, Catia Pesquita

Abstract:A knowledge graph is a powerful representation of real-world entities and their relations. The vast majority of these relations are defined as positive statements, but the importance of negative statements is increasingly recognized, especially under an Open World Assumption. Explicitly considering negative statements has been shown to improve performance on tasks such as entity summarization and question answering or domain-specific tasks such as protein function prediction. However, no attention has been given to the exploration of negative statements by knowledge graph embedding approaches despite the potential of negative statements to produce more accurate representations of entities in a knowledge graph. We propose a novel approach, TrueWalks, to incorporate negative statements into the knowledge graph representation learning process. In particular, we present a novel walk-generation method that is able to not only differentiate between positive and negative statements but also take into account the semantic implications of negation in ontology-rich knowledge graphs. This is of particular importance for applications in the biomedical domain, where the inadequacy of embedding approaches regarding negative statements at the ontology level has been identified as a crucial limitation. We evaluate TrueWalks in ontology-rich biomedical knowledge graphs in two different predictive tasks based on KG embeddings: protein-protein interaction prediction and gene-disease association prediction. We conduct an extensive analysis over established benchmarks and demonstrate that our method is able to improve the performance of knowledge graph embeddings on all tasks.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions

Benchmark datasets for biomedical knowledge graphs with negative statements

Jul 21, 2023

Rita T. Sousa, Sara Silva, Catia Pesquita

Abstract:Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings.

Via

Access Paper or Ask Questions

Explainable Representations for Relation Prediction in Knowledge Graphs

Jun 22, 2023

Rita T. Sousa, Sara Silva, Catia Pesquita

Abstract:Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies

May 31, 2021

Susana Nunes, Rita T. Sousa, Catia Pesquita

Figure 1 for Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies

Figure 2 for Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies

Figure 3 for Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies

Abstract:Ontology-based approaches for predicting gene-disease associations include the more classical semantic similarity methods and more recently knowledge graph embeddings. While semantic similarity is typically restricted to hierarchical relations within the ontology, knowledge graph embeddings consider their full breadth. However, embeddings are produced over a single graph and complex tasks such as gene-disease association may require additional ontologies. We investigate the impact of employing richer semantic representations that are based on more than one ontology, able to represent both genes and diseases and consider multiple kinds of relations within the ontologies. Our experiments demonstrate the value of employing knowledge graph embeddings based on random-walks and highlight the need for a closer integration of different ontologies.

* 4 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions