Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Solen Quiniou

Summarization for Generative Relation Extraction in the Microbiome Domain

Jun 10, 2025

Oumaima El Khettari, Solen Quiniou, Samuel Chaffron

Figure 1 for Summarization for Generative Relation Extraction in the Microbiome Domain

Figure 2 for Summarization for Generative Relation Extraction in the Microbiome Domain

Figure 3 for Summarization for Generative Relation Extraction in the Microbiome Domain

Abstract:We explore a generative relation extraction (RE) pipeline tailored to the study of interactions in the intestinal microbiome, a complex and low-resource biomedical domain. Our method leverages summarization with large language models (LLMs) to refine context before extracting relations via instruction-tuned generation. Preliminary results on a dedicated corpus show that summarization improves generative RE performance by reducing noise and guiding the model. However, BERT-based RE approaches still outperform generative models. This ongoing work demonstrates the potential of generative methods to support the study of specialized domains in low-resources setting.

Via

Access Paper or Ask Questions

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Feb 20, 2024

Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Beatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud(+1 more)

Figure 1 for DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Figure 2 for DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Figure 3 for DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Figure 4 for DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Abstract:The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for the assessment of intrinsic PLMs qualities from various perspectives. Although still limited to few languages, this initiative has been undertaken in the biomedical field, notably English and Chinese. This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks. To bridge this research gap and account for the unique sensitivities of French, we present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark. It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. We evaluate 8 state-of-the-art pre-trained masked language models (MLMs) on general and biomedical-specific data, as well as English specific MLMs to assess their cross-lingual capabilities. Our experiments reveal that no single model excels across all tasks, while generalist models are sometimes still competitive.

* Accepted at LREC-Coling 2024

Via

Access Paper or Ask Questions

Building a Corpus for Biomedical Relation Extraction of Species Mentions

Jun 14, 2023

Oumaima El Khettari, Solen Quiniou, Samuel Chaffron

Figure 1 for Building a Corpus for Biomedical Relation Extraction of Species Mentions

Figure 2 for Building a Corpus for Biomedical Relation Extraction of Species Mentions

Figure 3 for Building a Corpus for Biomedical Relation Extraction of Species Mentions

Figure 4 for Building a Corpus for Biomedical Relation Extraction of Species Mentions

Abstract:We present a manually annotated corpus, Species-Species Interaction, for extracting meaningful binary relations between species, in biomedical texts, at sentence level, with a focus on the gut microbiota. The corpus leverages PubTator to annotate species in full-text articles after evaluating different Named Entity Recognition species taggers. Our first results are promising for extracting relations between species using BERT and its biomedical variants.

* Accepted in BioNLP@ACL 2023

Via

Access Paper or Ask Questions

Automatic segmentation of texts into units of meaning for reading assistance

Oct 11, 2019

Jean-Claude Houbart, Solen Quiniou, Marion Berthaut, Béatrice Daille, Claire Salomé

Figure 1 for Automatic segmentation of texts into units of meaning for reading assistance

Figure 2 for Automatic segmentation of texts into units of meaning for reading assistance

Figure 3 for Automatic segmentation of texts into units of meaning for reading assistance

Figure 4 for Automatic segmentation of texts into units of meaning for reading assistance

Abstract:The emergence of the digital book is a major step forward in providing access to reading, and therefore often to the common culture and the labour market. By allowing the enrichment of texts with cognitive crutches, EPub 3 compatible accessibility formats such as FROG have proven their effectiveness in alleviating but also reducing dyslexic disorders. In this paper, we show how Artificial Intelligence and particularly Transfer Learning with Google BERT can automate the division into units of meaning, and thus facilitate the creation of enriched digital books at a moderate cost.

* 7 pages, 7 figures. Work Presented at International Joint Conferences on Artificial Intelligence (IJCAI ) workshop on AI and the United Nations Sustainable Development Goals

Via

Access Paper or Ask Questions