Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Jun 13, 2024

Jiahao Nie, Gongjie Zhang, Wenbin An, Yap-Peng Tan, Alex C. Kot, Shijian Lu

Figure 1 for MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Figure 2 for MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Figure 3 for MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Figure 4 for MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Share this with someone who'll enjoy it:

Abstract:Despite the recent advancements in Multi-modal Large Language Models (MLLMs), understanding inter-object relations, i.e., interactions or associations between distinct objects, remains a major challenge for such models. This issue significantly hinders their advanced reasoning capabilities and is primarily due to the lack of large-scale, high-quality, and diverse multi-modal data essential for training and evaluating MLLMs. In this paper, we provide a taxonomy of inter-object relations and introduce Multi-Modal Relation Understanding (MMRel), a comprehensive dataset designed to bridge this gap by providing large-scale, high-quality and diverse data for studying inter-object relations with MLLMs. MMRel features three distinctive attributes: (i) It includes over 15K question-answer pairs, which are sourced from three distinct domains, ensuring large scale and high diversity; (ii) It contains a subset featuring highly unusual relations, on which MLLMs often fail due to hallucinations, thus are very challenging; (iii) It provides manually verified high-quality labels for inter-object relations. Thanks to these features, MMRel is ideal for evaluating MLLMs on relation understanding, as well as being used to fine-tune MLLMs to enhance relation understanding and even benefit overall performance in various vision-language tasks. Extensive experiments on various popular MLLMs validate the effectiveness of MMRel. Both MMRel dataset and the complete labeling scripts have been made publicly available.

View paper on

Share this with someone who'll enjoy it:

Title:MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Paper and Code