Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Jul 15, 2024

Rining Wu, Feixiang Zhou, Ziwei Yin, Jian K. Liu

Figure 1 for Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Figure 2 for Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Figure 3 for Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Figure 4 for Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Share this with someone who'll enjoy it:

Abstract:Our brains represent the ever-changing environment with neurons in a highly dynamic fashion. The temporal features of visual pixels in dynamic natural scenes are entrapped in the neuronal responses of the retina. It is crucial to establish the intrinsic temporal relationship between visual pixels and neuronal responses. Recent foundation vision models have paved an advanced way of understanding image pixels. Yet, neuronal coding in the brain largely lacks a deep understanding of its alignment with pixels. Most previous studies employ static images or artificial videos derived from static images for emulating more real and complicated stimuli. Despite these simple scenarios effectively help to separate key factors influencing visual coding, complex temporal relationships receive no consideration. To decompose the temporal features of visual coding in natural scenes, here we propose Vi-ST, a spatiotemporal convolutional neural network fed with a self-supervised Vision Transformer (ViT) prior, aimed at unraveling the temporal-based encoding patterns of retinal neuronal populations. The model demonstrates robust predictive performance in generalization tests. Furthermore, through detailed ablation experiments, we demonstrate the significance of each temporal module. Furthermore, we introduce a visual coding evaluation metric designed to integrate temporal considerations and compare the impact of different numbers of neuronal populations on complementary coding. In conclusion, our proposed Vi-ST demonstrates a novel modeling framework for neuronal coding of dynamic visual scenes in the brain, effectively aligning our brain representation of video with neuronal activity. The code is available at https://github.com/wurining/Vi-ST.

* This article is accepted by ECCV 2024, which ID is 12149. Accepted papers' id can be found in: https://eccv2024.ecva.net/Conferences/2024/AcceptedPapers

View paper on

Share this with someone who'll enjoy it:

Title:Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

Paper and Code