Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fatemeh Behrad

Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

Apr 03, 2025

Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans

Figure 1 for Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

Figure 2 for Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

Figure 3 for Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

Figure 4 for Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

Abstract:The capacity of Vision transformers (ViTs) to handle variable-sized inputs is often constrained by computational complexity and batch processing limitations. Consequently, ViTs are typically trained on small, fixed-size images obtained through downscaling or cropping. While reducing computational burden, these methods result in significant information loss, negatively affecting tasks like image aesthetic assessment. We introduce Charm, a novel tokenization approach that preserves Composition, High-resolution, Aspect Ratio, and Multi-scale information simultaneously. Charm prioritizes high-resolution details in specific regions while downscaling others, enabling shorter fixed-size input sequences for ViTs while incorporating essential information. Charm is designed to be compatible with pre-trained ViTs and their learned positional embeddings. By providing multiscale input and introducing variety to input tokens, Charm improves ViT performance and generalizability for image aesthetic assessment. We avoid cropping or changing the aspect ratio to further preserve information. Extensive experiments demonstrate significant performance improvements on various image aesthetic and quality assessment datasets (up to 8.1 %) using a lightweight ViT backbone. Code and pre-trained models are available at https://github.com/FBehrad/Charm.

* CVPR 2025

Via

Access Paper or Ask Questions

Investigating the Gestalt Principle of Closure in Deep Convolutional Neural Networks

Nov 01, 2024

Yuyan Zhang, Derya Soydaner, Fatemeh Behrad, Lisa Koßmann, Johan Wagemans

Abstract:Deep neural networks perform well in object recognition, but do they perceive objects like humans? This study investigates the Gestalt principle of closure in convolutional neural networks. We propose a protocol to identify closure and conduct experiments using simple visual stimuli with progressively removed edge sections. We evaluate well-known networks on their ability to classify incomplete polygons. Our findings reveal a performance degradation as the edge removal percentage increases, indicating that current models heavily rely on complete edge information for accurate classification. The data used in our study is available on Github.

* Published at the ESANN 2024 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium) and online event, 9-11 October 2024

Via

Access Paper or Ask Questions

BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment

Aug 26, 2024

Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten, Derya Soydaner, Johan Wagemans

Abstract:Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this paper, we explore the impact of local and global data augmentation techniques on artistic image aesthetic assessment (IAA). We introduce BackFlip, a local data augmentation technique designed specifically for artistic IAA. We evaluate the performance of BackFlip across three artistic image datasets and four neural network architectures, comparing it with the commonly used data augmentation techniques. Then, we analyze the effects of components within the BackFlip pipeline through an ablation study. Our findings demonstrate that local augmentations, such as BackFlip, tend to outperform global augmentations on artistic IAA in most cases, probably because they do not perturb the composition of the art images. These results emphasize the importance of considering both local and global augmentations in future computational aesthetics research.

* Published at the VISART VII workshop at ECCV 2024. Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten and Derya Soydaner contributed equally to this work

Via

Access Paper or Ask Questions

Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

Aug 22, 2024

Yuyan Zhang, Derya Soydaner, Lisa Koßmann, Fatemeh Behrad, Johan Wagemans

Figure 1 for Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

Figure 2 for Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

Figure 3 for Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

Figure 4 for Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

Abstract:The human brain has an inherent ability to fill in gaps to perceive figures as complete wholes, even when parts are missing or fragmented. This phenomenon is known as Closure in psychology, one of the Gestalt laws of perceptual organization, explaining how the human brain interprets visual stimuli. Given the importance of Closure for human object recognition, we investigate whether neural networks rely on a similar mechanism. Exploring this crucial human visual skill in neural networks has the potential to highlight their comparability to humans. Recent studies have examined the Closure effect in neural networks. However, they typically focus on a limited selection of Convolutional Neural Networks (CNNs) and have not reached a consensus on their capability to perform Closure. To address these gaps, we present a systematic framework for investigating the Closure principle in neural networks. We introduce well-curated datasets designed to test for Closure effects, including both modal and amodal completion. We then conduct experiments on various CNNs employing different measurements. Our comprehensive analysis reveals that VGG16 and DenseNet-121 exhibit the Closure effect, while other CNNs show variable results. We interpret these findings by blending insights from psychology and neural network research, offering a unique perspective that enhances transparency in understanding neural networks. Our code and dataset will be made available on GitHub.

Via

Access Paper or Ask Questions