Abstract:Recent studies on StyleGAN variants show promising performances for various generation tasks. In these models, latent codes have traditionally been manipulated and searched for the desired images. However, this approach sometimes suffers from a lack of photorealism in generated images due to a lack of knowledge about the geometry of the trained latent space. In this paper, we show a simple unsupervised method that provides well-trained local latent subspace, enabling latent code navigation while preserving the photorealism of the generated images. Specifically, the method identifies densely mapped latent spaces and restricts latent manipulations within the local latent subspace. Experimental results demonstrate that images generated within the local latent subspace maintain photorealism even when the latent codes are significantly and repeatedly manipulated. Moreover, experiments show that the method can be applied to latent code optimization for various types of style-based models. Our empirical evidence of the method will benefit applications in style-based models.
Abstract:Predicting image memorability has attracted interest in various fields. Consequently, prediction accuracy with convolutional neural network (CNN) models has been approaching the empirical upper bound estimated based on human consistency. However, identifying which feature representations embedded in CNN models are responsible for such high prediction accuracy of memorability remains an open question. To tackle this problem, this study sought to identify memorability-related feature representations in CNN models using brain similarity. Specifically, memorability prediction accuracy and brain similarity were examined and assessed by Brain-Score across 16,860 layers in 64 CNN models pretrained for object recognition. A clear tendency was shown in this comprehensive analysis that layers with high memorability prediction accuracy had higher brain similarity with the inferior temporal (IT) cortex, which is the highest stage in the ventral visual pathway. Furthermore, fine-tuning the 64 CNN models revealed that brain similarity with the IT cortex at the penultimate layer was positively correlated with memorability prediction accuracy. This analysis also showed that the best fine-tuned model provided accuracy comparable to the state-of-the-art CNN models developed specifically for memorability prediction. Overall, this study's results indicated that the CNN models' great success in predicting memorability relies on feature representation acquisition similar to the IT cortex. This study advanced our understanding of feature representations and its use for predicting image memorability.