Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ali Abdollahi

T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Mar 14, 2025

Seyed Mohammad Hadi Hosseini, Amir Mohammad Izadi, Ali Abdollahi, Armin Saghafian, Mahdieh Soleymani Baghshah

Figure 1 for T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Figure 2 for T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Figure 3 for T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Figure 4 for T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Abstract:Although recent text-to-image generative models have achieved impressive performance, they still often struggle with capturing the compositional complexities of prompts including attribute binding, and spatial relationships between different entities. This misalignment is not revealed by common evaluation metrics such as CLIPScore. Recent works have proposed evaluation metrics that utilize Visual Question Answering (VQA) by decomposing prompts into questions about the generated image for more robust compositional evaluation. Although these methods align better with human evaluations, they still fail to fully cover the compositionality within the image. To address this, we propose a novel metric that breaks down images into components, and texts into fine-grained questions about the generated image for evaluation. Our method outperforms previous state-of-the-art metrics, demonstrating its effectiveness in evaluating text-to-image generative models. Code is available at https://github.com/hadi-hosseini/ T2I-FineEval.

* Accepted at ECCV 2024 Workshop EVAL-FoMo

Via

Access Paper or Ask Questions

Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Mar 09, 2025

Amir Mohammad Izadi, Seyed Mohammad Hadi Hosseini, Soroush Vafaie Tabar, Ali Abdollahi, Armin Saghafian, Mahdieh Soleymani Baghshah

Figure 1 for Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Figure 2 for Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Figure 3 for Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Figure 4 for Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Abstract:Text-to-image generative models have made significant advancements in recent years; however, accurately capturing intricate details in textual prompts, such as entity missing, attribute binding errors, and incorrect relationships remains a formidable challenge. In response, we present an innovative, training-free method that directly addresses these challenges by incorporating tailored objectives to account for textual constraints. Unlike layout-based approaches that enforce rigid structures and limit diversity, our proposed approach offers a more flexible arrangement of the scene by imposing just the extracted constraints from the text, without any unnecessary additions. These constraints are formulated as losses-entity missing, entity mixing, attribute binding, and spatial relationships, integrated into a unified loss that is applied in the first generation stage. Furthermore, we introduce a feedback-driven system for fine-grained initial noise refinement. This system integrates a verifier that evaluates the generated image, identifies inconsistencies, and provides corrective feedback. Leveraging this feedback, our refinement method first targets the unmet constraints by refining the faulty attention maps caused by initial noise, through the optimization of selective losses associated with these constraints. Subsequently, our unified loss function is reapplied to proceed the second generation phase. Experimental results demonstrate that our method, relying solely on our proposed objective functions, significantly enhances compositionality, achieving a 24% improvement in human evaluation and a 25% gain in spatial relationships. Furthermore, our fine-grained noise refinement proves effective, boosting performance by up to 5%. Code is available at https://github.com/hadi-hosseini/noise-refinement.

Via

Access Paper or Ask Questions

GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

Jul 30, 2024

Ali Abdollahi, Mahdi Ghaznavi, Mohammad Reza Karimi Nejad, Arash Mari Oriyad, Reza Abbasi, Ali Salesi, Melika Behjati, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah

Figure 1 for GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

Figure 2 for GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

Figure 3 for GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

Figure 4 for GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

Abstract:Vision-language models (VLMs) are intensively used in many downstream tasks, including those requiring assessments of individuals appearing in the images. While VLMs perform well in simple single-person scenarios, in real-world applications, we often face complex situations in which there are persons of different genders doing different activities. We show that in such cases, VLMs are biased towards identifying the individual with the expected gender (according to ingrained gender stereotypes in the model or other forms of sample selection bias) as the performer of the activity. We refer to this bias in associating an activity with the gender of its actual performer in an image or text as the Gender-Activity Binding (GAB) bias and analyze how this bias is internalized in VLMs. To assess this bias, we have introduced the GAB dataset with approximately 5500 AI-generated images that represent a variety of activities, addressing the scarcity of real-world images for some scenarios. To have extensive quality control, the generated images are evaluated for their diversity, quality, and realism. We have tested 12 renowned pre-trained VLMs on this dataset in the context of text-to-image and image-to-text retrieval to measure the effect of this bias on their predictions. Additionally, we have carried out supplementary experiments to quantify the bias in VLMs' text encoders and to evaluate VLMs' capability to recognize activities. Our experiments indicate that VLMs experience an average performance decline of about 13.2% when confronted with gender-activity binding bias.

Via

Access Paper or Ask Questions

Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

Mar 16, 2021

Xiaojun Li, Jianwei Li, Ali Abdollahi, Trevor Jones, Asif Habeebullah

Figure 1 for Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

Figure 2 for Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

Figure 3 for Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

Figure 4 for Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

Abstract:For electric vehicles (EV) and energy storage (ES) batteries, thermal runaway is a critical issue as it can lead to uncontrollable fires or even explosions. Thermal anomaly detection can identify problematic battery packs that may eventually undergo thermal runaway. However, there are common challenges like data unavailability, environment variations, and battery aging. We propose a data-driven method to detect battery thermal anomaly based on comparing shape-similarity between thermal measurements. Based on their shapes, the measurements are continuously being grouped into different clusters. Anomaly is detected by monitoring deviations within the clusters. Unlike model-based or other data-driven methods, the proposed method is robust to data loss and requires minimal reference data for different pack configurations. As the initial experimental results show, the method not only can be more accurate than the onboard BMS, but also can detect unforeseen anomalies at the early stage.

* 5 pages

Via

Access Paper or Ask Questions