Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sylvain Massip

Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning

Feb 21, 2024

Loïc Rakotoson, Sylvain Massip, Fréjus A. A. Laleye

Abstract:Information retrieval is a rapidly evolving field. However it still faces significant limitations in the scientific and industrial vast amounts of information, such as semantic divergence and vocabulary gaps in sparse retrieval, low precision and lack of interpretability in semantic search, or hallucination and outdated information in generative models. In this paper, we introduce a two-block approach to tackle these hurdles for long documents. The first block enhances language understanding in sparse retrieval by query expansion to retrieve relevant documents. The second block deepens the result by providing comprehensive and informative answers to the complex question using only the information spread in the long document, enabling bidirectional engagement. At various stages of the pipeline, intermediate results are presented to users to facilitate understanding of the system's reasoning. We believe this bidirectional approach brings significant advancements in terms of transparency, logical thinking, and comprehensive understanding in the field of scientific information retrieval.

* 6 pages, 3 figures

Via

Access Paper or Ask Questions

Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction

Jun 07, 2023

Fréjus A. A. Laleye, Loïc Rakotoson, Sylvain Massip

Abstract:Relation extraction task is a crucial and challenging aspect of Natural Language Processing. Several methods have surfaced as of late, exhibiting notable performance in addressing the task; however, most of these approaches rely on vast amounts of data from large-scale knowledge graphs or language models pretrained on voluminous corpora. In this paper, we hone in on the effective utilization of solely the knowledge supplied by a corpus to create a high-performing model. Our objective is to showcase that by leveraging the hierarchical structure and relational distribution of entities within a corpus without introducing external knowledge, a relation extraction model can achieve significantly enhanced performance. We therefore proposed a relation extraction approach based on the incorporation of pretrained knowledge graph embeddings at the corpus scale into the sentence-level contextual representation. We conducted a series of experiments which revealed promising and very interesting results for our proposed approach.The obtained results demonstrated an outperformance of our method compared to context-based relation extraction models.

* 15 pages, 1 figures, The 17th International Conference on Document Analysis and Recognition

Via

Access Paper or Ask Questions

Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Apr 29, 2022

Loïc Rakotoson, Charles Letaillieur, Sylvain Massip, Fréjus Laleye

Figure 1 for Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Figure 2 for Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Figure 3 for Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Figure 4 for Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Abstract:With the explosive growth of scientific publications, making the synthesis of scientific knowledge and fact checking becomes an increasingly complex task. In this paper, we propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles. We propose an intelligent combination of (1) an automatic information summarization and (2) a Boolean Question Answering which allows to generate an answer to a scientific question from only extracts obtained after summarization. Thus on a given topic, our proposed approach conducts structured content modeling based on paper abstracts to answer a scientific question while highlighting texts from paper that discuss the topic. We based our final system on an end-to-end Extractive Question Answering (EQA) combined with a three outputs classification model to perform in-depth semantic understanding of a question to illustrate the aggregation of multiple responses. With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%. Our results are supported via experiments with two QA models (BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and health domains on Europe PMC.

* Proceedings of the 1st International Workshop on Multimedia AI against Disinformation (2022)
* 8 pages, 4 figures

Via

Access Paper or Ask Questions

BagBERT: BERT-based bagging-stacking for multi-topic classification

Nov 10, 2021

Loïc Rakotoson, Charles Letaillieur, Sylvain Massip, Fréjus Laleye

Figure 1 for BagBERT: BERT-based bagging-stacking for multi-topic classification

Figure 2 for BagBERT: BERT-based bagging-stacking for multi-topic classification

Figure 3 for BagBERT: BERT-based bagging-stacking for multi-topic classification

Figure 4 for BagBERT: BERT-based bagging-stacking for multi-topic classification

Abstract:This paper describes our submission on the COVID-19 literature annotation task at Biocreative VII. We proposed an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label. Our proposed approach consists of two stages: (1) A bagging of various initializations of the training data that features weakly trained weights, (2) A stacking of heterogeneous vocabulary models based on BERT and RoBERTa Embeddings. The aggregation of these weak insights performs better than a classical globally efficient model. The purpose is the distillation of the richness of knowledge to a simpler and lighter model. Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.

Via

Access Paper or Ask Questions