Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gustavo Bartz Guedes

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Aug 29, 2024

Leandro Carísio Fernandes, Gustavo Bartz Guedes, Thiago Soares Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira

Figure 1 for SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Figure 2 for SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Figure 3 for SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Figure 4 for SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Abstract:Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.

* 15 pages, 6 figures, 1 table. Submitted to BRACIS 2024

Via

Access Paper or Ask Questions

Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Mar 30, 2024

Gustavo Bartz Guedes, Ana Estela Antunes da Silva

Figure 1 for Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Figure 2 for Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Figure 3 for Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Figure 4 for Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Abstract:Scientific articles are long text documents organized into sections, each describing aspects of the research. Analyzing scientific production has become progressively challenging due to the increase in the number of available articles. Within this scenario, our approach consisted of fine-tuning transformer language models to generate sentence-level embeddings from scientific articles, considering the following labels: background, objective, methods, results, and conclusion. We trained our models on three datasets with contrastive learning. Two datasets are from the article's abstracts in the computer science and medical domains. Also, we introduce PMC-Sents-FULL, a novel dataset of sentences extracted from the full texts of medical articles. We compare the fine-tuned and baseline models in clustering and classification tasks to evaluate our approach. On average, clustering agreement measures values were five times higher. For the classification measures, in the best-case scenario, we had an average improvement in F1-micro of 30.73\%. Results show that fine-tuning sentence transformers with contrastive learning and using the generated embeddings in downstream tasks is a feasible approach to sentence classification in scientific articles. Our experiment codes are available on GitHub.

* Computer Science & Information Technology (CS & IT), pp. 293-305, 2023

Via

Access Paper or Ask Questions