Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Aug 29, 2023

Shuxiao Ma, Linyuan Wang, Bin Yan

Figure 1 for A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Figure 2 for A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Figure 3 for A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Figure 4 for A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Share this with someone who'll enjoy it:

Abstract:Biological research has revealed that the verbal semantic information in the brain cortex, as an additional source, participates in nonverbal semantic tasks, such as visual encoding. However, previous visual encoding models did not incorporate verbal semantic information, contradicting this biological finding. This paper proposes a multimodal visual information encoding network model based on stimulus images and associated textual information in response to this issue. Our visual information encoding network model takes stimulus images as input and leverages textual information generated by a text-image generation model as verbal semantic information. This approach injects new information into the visual encoding model. Subsequently, a Transformer network aligns image and text feature information, creating a multimodal feature space. A convolutional network then maps from this multimodal feature space to voxel space, constructing the multimodal visual information encoding network model. Experimental results demonstrate that the proposed multimodal visual information encoding network model outperforms previous models under the exact training cost. In voxel prediction of the left hemisphere of subject 1's brain, the performance improves by approximately 15.87%, while in the right hemisphere, the performance improves by about 4.6%. The multimodal visual encoding network model exhibits superior encoding performance. Additionally, ablation experiments indicate that our proposed model better simulates the brain's visual information processing.

View paper on

Share this with someone who'll enjoy it:

Title:A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Paper and Code