Abstract:This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
Abstract:The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music
Abstract:The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at https://sites.google.com/view/stablediffusion-with-brain/. Code is also available at https://github.com/yu-takagi/StableDiffusionReconstruction.
Abstract:Multi-regional interaction among neuronal populations underlies the brain's processing of rich sensory information in our daily lives. Recent neuroscience and neuroimaging studies have increasingly used naturalistic stimuli and experimental design to identify such realistic sensory computation in the brain. However, existing methods for cross-areal interaction analysis with dimensionality reduction, such as reduced-rank regression and canonical correlation analysis, have limited applicability and interpretability in naturalistic settings because they usually do not appropriately 'demix' neural interactions into those associated with different types of task parameters or stimulus features (e.g., visual or audio). In this paper, we develop a new method for cross-areal interaction analysis that uses the rich task or stimulus parameters to reveal how and what types of information are shared by different neural populations. The proposed neural demixed shared component analysis combines existing dimensionality reduction methods with a practical neural network implementation of functional analysis of variance with latent variables, thereby efficiently demixing nonlinear effects of continuous and multimodal stimuli. We also propose a simplifying alternative under the assumptions of linear effects and unimodal stimuli. To demonstrate our methods, we analyzed two human neuroimaging datasets of participants watching naturalistic videos of movies and dance movements. The results demonstrate that our methods provide new insights into multi-regional interaction in the brain during naturalistic sensory inputs, which cannot be captured by conventional techniques.