Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Jan 24, 2024

Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen(+4 more)

Figure 1 for SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Figure 2 for SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Figure 3 for SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Figure 4 for SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Share this with someone who'll enjoy it:

Abstract:Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR.

View paper on

Share this with someone who'll enjoy it:

Title:SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Paper and Code