Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Leixin Zhang

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Dec 03, 2024

Leixin Zhang, Steffen Eger, Yinjie Cheng, Weihe Zhai, Jonas Belouadi, Christoph Leiter, Simone Paolo Ponzetto, Fahimeh Moafian, Zhixue Zhao

Figure 1 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Figure 2 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Figure 3 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Figure 4 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Abstract:Multimodal large language models (LLMs) have demonstrated impressive capabilities in generating high-quality images from textual instructions. However, their performance in generating scientific images--a critical application for accelerating scientific progress--remains underexplored. In this work, we address this gap by introducing ScImage, a benchmark designed to evaluate the multimodal capabilities of LLMs in generating scientific images from textual descriptions. ScImage assesses three key dimensions of understanding: spatial, numeric, and attribute comprehension, as well as their combinations, focusing on the relationships between scientific objects (e.g., squares, circles). We evaluate five models, GPT-4o, Llama, AutomaTikZ, Dall-E, and StableDiffusion, using two modes of output generation: code-based outputs (Python, TikZ) and direct raster image generation. Additionally, we examine four different input languages: English, German, Farsi, and Chinese. Our evaluation, conducted with 11 scientists across three criteria (correctness, relevance, and scientific accuracy), reveals that while GPT-4o produces outputs of decent quality for simpler prompts involving individual dimensions such as spatial, numeric, or attribute understanding in isolation, all models face challenges in this task, especially for more complex prompts.

Via

Access Paper or Ask Questions

Tübingen-CL at SemEval-2024 Task 1:Ensemble Learning for Semantic Relatedness Estimation

Oct 14, 2024

Leixin Zhang, Çağrı Çöltekin

Figure 1 for Tübingen-CL at SemEval-2024 Task 1:Ensemble Learning for Semantic Relatedness Estimation

Figure 2 for Tübingen-CL at SemEval-2024 Task 1:Ensemble Learning for Semantic Relatedness Estimation

Figure 3 for Tübingen-CL at SemEval-2024 Task 1:Ensemble Learning for Semantic Relatedness Estimation

Abstract:The paper introduces our system for SemEval-2024 Task 1, which aims to predict the relatedness of sentence pairs. Operating under the hypothesis that semantic relatedness is a broader concept that extends beyond mere similarity of sentences, our approach seeks to identify useful features for relatedness estimation. We employ an ensemble approach integrating various systems, including statistical textual features and outputs of deep learning models to predict relatedness scores. The findings suggest that semantic relatedness can be inferred from various sources and ensemble models outperform many individual systems in estimating semantic relatedness.

* Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
* 5 pages

Via

Access Paper or Ask Questions