Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Mar 16, 2024

Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

Figure 1 for A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Figure 2 for A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Figure 3 for A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Figure 4 for A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Share this with someone who'll enjoy it:

Abstract:While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored. In this paper, we conduct a comprehensive and systematic study of prompting MLLMs for IQA. Specifically, we first investigate nine prompting systems for MLLMs as the combinations of three standardized testing procedures in psychophysics (i.e., the single-stimulus, double-stimulus, and multiple-stimulus methods) and three popular prompting strategies in natural language processing (i.e., the standard, in-context, and chain-of-thought prompting). We then present a difficult sample selection procedure, taking into account sample diversity and uncertainty, to further challenge MLLMs equipped with the respective optimal prompting systems. We assess three open-source and one close-source MLLMs on several visual attributes of image quality (e.g., structural and textural distortions, color differences, and geometric transformations) in both full-reference and no-reference scenarios. Experimental results show that only the close-source GPT-4V provides a reasonable account for human perception of image quality, but is weak at discriminating fine-grained quality variations (e.g., color differences) and at comparing visual quality of multiple images, tasks humans can perform effortlessly.

View paper on

Share this with someone who'll enjoy it:

Title:A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Paper and Code