Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alena Tsanda

Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

May 13, 2024

Alena Tsanda, Elena Bruches

Figure 1 for Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

Figure 2 for Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

Figure 3 for Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

Figure 4 for Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

Abstract:The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on https://github.com/iis-research-team/summarization-dataset.

* 12 pages, accepted to AINL

Via

Access Paper or Ask Questions