Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Jun 19, 2024

Rushikesh Zawar, Shaurya Dewan, Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

Figure 1 for StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Figure 2 for StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Figure 3 for StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Figure 4 for StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Share this with someone who'll enjoy it:

Abstract:Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. These frameworks account for the visual variability of objects, as well as complex object co-occurrences and sources of noise such as diverse lighting conditions. By leveraging large-scale datasets and cross-attention conditioning, these models generate detailed and contextually rich scene representations. This capability opens new avenues for improving object recognition and scene understanding in varied and challenging environments. Our work presents StableSemantics, a dataset comprising 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps corresponding to individual noun chunks. We explicitly leverage human-generated prompts that correspond to visually interesting stable diffusion generations, provide 10 generations per phrase, and extract cross-attention maps for each image. We explore the semantic distribution of generated images, examine the distribution of objects within images, and benchmark captioning and open vocabulary segmentation methods on our data. To the best of our knowledge, we are the first to release a diffusion dataset with semantic attributions. We expect our proposed dataset to catalyze advances in visual semantic understanding and provide a foundation for developing more sophisticated and effective visual models. Website: https://stablesemantics.github.io/StableSemantics

* Dataset website: https://stablesemantics.github.io/StableSemantics

View paper on

Share this with someone who'll enjoy it:

Title:StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Paper and Code