Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jongyun Choi

LAME: Layout Aware Metadata Extraction Approach for Research Articles

Dec 23, 2021

Jongyun Choi, Hyesoo Kong, Hwamook Yoon, Heung-Seon Oh, Yuchul Jung

Figure 1 for LAME: Layout Aware Metadata Extraction Approach for Research Articles

Figure 2 for LAME: Layout Aware Metadata Extraction Approach for Research Articles

Figure 3 for LAME: Layout Aware Metadata Extraction Approach for Research Articles

Figure 4 for LAME: Layout Aware Metadata Extraction Approach for Research Articles

Abstract:The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of an automatic layout analysis, construction of a large meta-data training set, and construction of Layout-MetaBERT). We designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed Layout-MetaBERT to extract the metadata from academic journals with varying layout formats. The experimental results with Layout-MetaBERT exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats.

Via

Access Paper or Ask Questions