Abstract:In this work we propose a photorealistic style transfer method for image and video that is based on vision science principles and on a recent mathematical formulation for the deterministic decoupling of sample statistics. The novel aspects of our approach include matching decoupled moments of higher order than in common style transfer approaches, and matching a descriptor of the power spectrum so as to characterize and transfer diffusion effects between source and target, which is something that has not been considered before in the literature. The results are of high visual quality, without spatio-temporal artifacts, and validation tests in the form of observer preference experiments show that our method compares very well with the state-of-the-art. The computational complexity of the algorithm is low, and we propose a numerical implementation that is amenable for real-time video application. Finally, another contribution of our work is to point out that current deep learning approaches for photorealistic style transfer don't really achieve photorealistic quality outside of limited examples, because the results too often show unacceptable visual artifacts.
Abstract:Visual illusions are a very useful tool for vision scientists, because they allow them to better probe the limits, thresholds and errors of the visual system. In this work we introduce the first ever framework to generate novel visual illusions with an artificial neural network (ANN). It takes the form of a generative adversarial network, with a generator of visual illusion candidates and two discriminator modules, one for the inducer background and another that decides whether or not the candidate is indeed an illusion. The generality of the model is exemplified by synthesizing illusions of different types, and validated with psychophysical experiments that corroborate that the outputs of our ANN are indeed visual illusions to human observers. Apart from synthesizing new visual illusions, which may help vision researchers, the proposed model has the potential to open new ways to study the similarities and differences between ANN and human visual perception.
Abstract:We consider the evolution model proposed in [9, 6] to describe illusory contrast perception phenomena induced by surrounding orientations. Firstly, we highlight its analogies and differences with widely used Wilson-Cowan equations [48], mainly in terms of efficient representation properties. Then, in order to explicitly encode local directional information, we exploit the model of the primary visual cortex V1 proposed in [20] and largely used over the last years for several image processing problems [24,38,28]. The resulting model is capable to describe assimilation and contrast visual bias at the same time, the main novelty being its explicit dependence on local image orientation. We report several numerical tests showing the ability of the model to explain, in particular, orientation-dependent phenomena such as grating induction and a modified version of the Poggendorff illusion. For this latter example, we empirically show the existence of a set of threshold parameters differentiating from inpainting to perception-type reconstructions, describing long-range connectivity between different hypercolumns in the primary visual cortex.
Abstract:We consider a differential model describing neuro-physiological contrast perception phenomena induced by surrounding orientations. The mathematical formulation relies on a cortical-inspired modelling [10] largely used over the last years to describe neuron interactions in the primary visual cortex (V1) and applied to several image processing problems [12,19,13]. Our model connects to Wilson-Cowan-type equations [23] and it is analogous to the one used in [3,2,14] to describe assimilation and contrast phenomena, the main novelty being its explicit dependence on local image orientation. To confirm the validity of the model, we report some numerical tests showing its ability to explain orientation-dependent phenomena (such as grating induction) and geometric-optical illusions [21,16] classically explained only by filtering-based techniques [6,18].
Abstract:Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the operations present in Convolutional Neural Networks (CNNs). This motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. We believe that this CNNs behaviour appears as a by-product of the training for the low level vision tasks of denoising, color constancy or deblurring. Our work opens a new bridge between human perception and CNNs: in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.