Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James Henderson

Idiap Research Institute

Reduction of Supervision for Biomedical Knowledge Discovery

Apr 13, 2025

Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens

Abstract:Knowledge discovery is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive and time-consuming and hinders scalability when exploring new domains. In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on biomedical benchmark datasets explores the effectiveness of the methods. Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision. By gradually decreasing supervision, we assess the robustness of pointwise binary classification techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, suggesting an encouraging direction toward adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.

* Published as part of the PhD dissertation: Theodoropoulos, Christos, Marie-Francine Moens, and Matthew Blaschko. "Deep Learning Models for the Extraction of Knowledge from Text." (2025)

Via

Access Paper or Ask Questions

Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules

Dec 18, 2024

Andrew G. T. Pyo, Yuta Nagano, Martina Milighetti, James Henderson, Curtis G. Callan Jr., Benny Chain, Ned S. Wingreen, Andreas Tiffeau-Mayer

Figure 1 for Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules

Figure 2 for Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules

Figure 3 for Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules

Figure 4 for Data-driven Discovery of Biophysical T Cell Receptor Co-specificity Rules

Abstract:The biophysical interactions between the T cell receptor (TCR) and its ligands determine the specificity of the cellular immune response. However, the immense diversity of receptors and ligands has made it challenging to discover generalizable rules across the distinct binding affinity landscapes created by different ligands. Here, we present an optimization framework for discovering biophysical rules that predict whether TCRs share specificity to a ligand. Applying this framework to TCRs associated with a collection of SARS-CoV-2 peptides we establish how co-specificity depends on the type and position of amino-acid differences between receptors. We also demonstrate that the inferred rules generalize to ligands not seen during training. Our analysis reveals that matching of steric properties between substituted amino acids is important for receptor co-specificity, in contrast with the hydrophobic properties that more prominently determine evolutionary substitutability. We furthermore find that positions not in direct contact with the peptide still significantly impact specificity. These findings highlight the potential for data-driven approaches to uncover the molecular mechanisms underpinning the specificity of adaptive immune responses.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors

Aug 13, 2024

Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson

Figure 1 for Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors

Figure 2 for Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors

Figure 3 for Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors

Figure 4 for Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors

Abstract:Link prediction models can benefit from incorporating textual descriptions of entities and relations, enabling fully inductive learning and flexibility in dynamic graphs. We address the challenge of also capturing rich structured information about the local neighbourhood of entities and their relations, by introducing a Transformer-based approach that effectively integrates textual descriptions with graph structure, reducing the reliance on resource-intensive text encoders. Our experiments on three challenging datasets show that our Fast-and-Frugal Text-Graph (FnF-TG) Transformers achieve superior performance compared to the previous state-of-the-art methods, while maintaining efficiency and scalability.

Via

Access Paper or Ask Questions

Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Jul 18, 2024

Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens

Figure 1 for Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Figure 2 for Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Figure 3 for Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Figure 4 for Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Abstract:The ever-growing volume of biomedical publications creates a critical need for efficient knowledge discovery. In this context, we introduce an open-source end-to-end framework designed to construct knowledge around specific diseases directly from raw text. To facilitate research in disease-related knowledge discovery, we create two annotated datasets focused on Rett syndrome and Alzheimer's disease, enabling the identification of semantic relations between biomedical entities. Extensive benchmarking explores various ways to represent relations and entity representations, offering insights into optimal modeling strategies for semantic relation detection and highlighting language models' competence in knowledge discovery. We also conduct probing experiments using different layer representations and attention scores to explore transformers' ability to capture semantic relations.

* Under Review

Via

Access Paper or Ask Questions

Contrastive learning of T cell receptor representations

Jun 10, 2024

Yuta Nagano, Andrew Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer

Abstract:Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.

* 19 pages, 17 figures

Via

Access Paper or Ask Questions

Nonparametric Variational Regularisation of Pretrained Transformers

Dec 01, 2023

Fabio Fehr, James Henderson

Figure 1 for Nonparametric Variational Regularisation of Pretrained Transformers

Figure 2 for Nonparametric Variational Regularisation of Pretrained Transformers

Figure 3 for Nonparametric Variational Regularisation of Pretrained Transformers

Figure 4 for Nonparametric Variational Regularisation of Pretrained Transformers

Abstract:The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models.

Via

Access Paper or Ask Questions

Transformers as Graph-to-Graph Models

Oct 27, 2023

James Henderson, Alireza Mohammadshahi, Andrei C. Coman, Lesly Miculicich

Abstract:We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.

* Accepted to Big Picture workshop at EMNLP 2023

Via

Access Paper or Ask Questions

Learning to Abstract with Nonparametric Variational Information Bottleneck

Oct 26, 2023

Melika Behjati, Fabio Fehr, James Henderson

Figure 1 for Learning to Abstract with Nonparametric Variational Information Bottleneck

Figure 2 for Learning to Abstract with Nonparametric Variational Information Bottleneck

Figure 3 for Learning to Abstract with Nonparametric Variational Information Bottleneck

Figure 4 for Learning to Abstract with Nonparametric Variational Information Bottleneck

Abstract:Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can learn to compress to different levels of abstraction at different layers of the same model. We apply Nonparametric Variational Information Bottleneck (NVIB) to stacked Transformer self-attention layers in the encoder, which encourages an information-theoretic compression of the representations through the model. We find that the layers within the model correspond to increasing levels of abstraction and that their representations are more linguistically informed. Finally, we show that NVIB compression results in a model which is more robust to adversarial perturbations.

* Accepted to Findings of EMNLP 2023

Via

Access Paper or Ask Questions

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction

Aug 28, 2023

Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson

Abstract:Document-level relation extraction aims to identify relationships between entities within a document. Current methods rely on text-based encoders and employ various hand-coded pooling heuristics to aggregate information from entity mentions and associated contexts. In this paper, we replace these rigid pooling functions with explicit graph relations by leveraging the intrinsic graph processing capabilities of the Transformer model. We propose a joint text-graph Transformer model, and a graph-assisted declarative pooling (GADePo) specification of the input which provides explicit and high-level instructions for information aggregation. This allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customizable pooling strategies. We extensively evaluate our method across diverse datasets and models, and show that our approach yields promising results that are comparable to those achieved by the hand-coded pooling functions.

Via

Access Paper or Ask Questions

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

May 15, 2023

Rabeeh Karimi Mahabadi, Jaesung Tae, Hamish Ivison, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

Figure 1 for TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Figure 2 for TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Figure 3 for TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Figure 4 for TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Abstract:Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various domains with continuous-valued inputs. Despite the promises of fully non-autoregressive text generation, applying diffusion models to natural language remains challenging due to its discrete nature. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the typical learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models and is competitive with pretrained autoregressive sequence-to-sequence models.

* 9 pages, 4 figures, preprint

Via

Access Paper or Ask Questions