Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunhao Zhang

How Syntax Specialization Emerges in Language Models

May 26, 2025

Xufeng Duan, Zhaoqian Yao, Yunhao Zhang, Shaonan Wang, Zhenguang G. Cai

Abstract:Large language models (LLMs) have been found to develop surprising internal specializations: Individual neurons, attention heads, and circuits become selectively sensitive to syntactic structure, reflecting patterns observed in the human brain. While this specialization is well-documented, how it emerges during training and what influences its development remains largely unknown. In this work, we tap into the black box of specialization by tracking its formation over time. By quantifying internal syntactic consistency across minimal pairs from various syntactic phenomena, we identify a clear developmental trajectory: Syntactic sensitivity emerges gradually, concentrates in specific layers, and exhibits a 'critical period' of rapid internal specialization. This process is consistent across architectures and initialization parameters (e.g., random seeds), and is influenced by model scale and training data. We therefore reveal not only where syntax arises in LLMs but also how some models internalize it during training. To support future research, we will release the code, models, and training checkpoints upon acceptance.

Via

Access Paper or Ask Questions

SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection

Mar 11, 2025

Hyeongseok Son, Jia He, Seung-In Park, Ying Min, Yunhao Zhang, ByungIn Yoo

Abstract:Most previous 3D object detection methods that leverage the multi-modality of LiDAR and cameras utilize the Bird's Eye View (BEV) space for intermediate feature representation. However, this space uses a low x, y-resolution and sacrifices z-axis information to reduce the overall feature resolution, which may result in declined accuracy. To tackle the problem of using low-resolution features, this paper focuses on the sparse nature of LiDAR point cloud data. From our observation, the number of occupied cells in the 3D voxels constructed from a LiDAR data can be even fewer than the number of total cells in the BEV map, despite the voxels' significantly higher resolution. Based on this, we introduce a novel sparse voxel-based transformer network for 3D object detection, dubbed as SparseVoxFormer. Instead of performing BEV feature extraction, we directly leverage sparse voxel features as the input for a transformer-based detector. Moreover, with regard to the camera modality, we introduce an explicit modality fusion approach that involves projecting 3D voxel coordinates onto 2D images and collecting the corresponding image features. Thanks to these components, our approach can leverage geometrically richer multi-modal features while even reducing the computational cost. Beyond the proof-of-concept level, we further focus on facilitating better multi-modal fusion and flexible control over the number of sparse features. Finally, thorough experimental results demonstrate that utilizing a significantly smaller number of sparse features drastically reduces computational costs in a 3D object detector while enhancing both overall and long-range performance.

Via

Access Paper or Ask Questions

LinSATNet: The Positive Linear Satisfiability Neural Networks

Jul 18, 2024

Runzhong Wang, Yunhao Zhang, Ziao Guo, Tianyi Chen, Xiaokang Yang, Junchi Yan

Abstract:Encoding constraints into neural networks is attractive. This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions. We further theoretically characterize the convergence property of the Sinkhorn algorithm for multiple marginals. In contrast to the sequential decision e.g.\ reinforcement learning-based solvers, we showcase our technique in solving constrained (specifically satisfiability) problems by one-shot neural networks, including i) a neural routing solver learned without supervision of optimal solutions; ii) a partial graph matching network handling graphs with unmatchable outliers on both sides; iii) a predictive network for financial portfolios with continuous constraints. To our knowledge, there exists no one-shot neural solver for these scenarios when they are formulated as satisfiability problems. Source code is available at https://github.com/Thinklab-SJTU/LinSATNet

* This is a revised version of our ICML'23 publication that fixes a minor issue in Eq (11). In Proceedings of the 40th International Conference on Machine Learning (ICML'23)

Via

Access Paper or Ask Questions

Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Apr 30, 2024

Yunhao Zhang, Shaonan Wang, Xinyi Dong, Jiajun Yu, Chengqing Zong

Figure 1 for Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Figure 2 for Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Figure 3 for Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Figure 4 for Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Abstract:Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, we conducted an analysis comparing encoding performance of various neural language models and psychologically plausible models. Our study utilized extensive multi-modal cognitive datasets, examining bilingual word and discourse levels. Surprisingly, our findings revealed that psychologically plausible models outperformed neural language models across diverse contexts, encompassing different modalities such as fMRI and eye-tracking, and spanning languages from English to Chinese. Among psychologically plausible models, the one incorporating embodied information emerged as particularly exceptional. This model demonstrated superior performance at both word and discourse levels, exhibiting robust prediction of brain activation across numerous regions in both English and Chinese.

Via

Access Paper or Ask Questions

Computational Models to Study Language Processing in the Human Brain: A Survey

Mar 20, 2024

Shaonan Wang, Jingyuan Sun, Yunhao Zhang, Nan Lin, Marie-Francine Moens, Chengqing Zong

Figure 1 for Computational Models to Study Language Processing in the Human Brain: A Survey

Figure 2 for Computational Models to Study Language Processing in the Human Brain: A Survey

Figure 3 for Computational Models to Study Language Processing in the Human Brain: A Survey

Figure 4 for Computational Models to Study Language Processing in the Human Brain: A Survey

Abstract:Despite differing from the human language processing mechanism in implementation and algorithms, current language models demonstrate remarkable human-like or surpassing language capabilities. Should computational language models be employed in studying the brain, and if so, when and how? To delve into this topic, this paper reviews efforts in using computational models for brain research, highlighting emerging trends. To ensure a fair comparison, the paper evaluates various computational models using consistent metrics on the same dataset. Our analysis reveals that no single model outperforms others on all datasets, underscoring the need for rich testing datasets and rigid experimental control to draw robust conclusions in studies involving computational models.

Via

Access Paper or Ask Questions

MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models

Mar 02, 2024

Yunhao Zhang, Xiaohan Zhang, Chong Li, Shaonan Wang, Chengqing Zong

Abstract:Pre-trained computational language models have recently made remarkable progress in harnessing the language abilities which were considered unique to humans. Their success has raised interest in whether these models represent and process language like humans. To answer this question, this paper proposes MulCogBench, a multi-modal cognitive benchmark dataset collected from native Chinese and English participants. It encompasses a variety of cognitive data, including subjective semantic ratings, eye-tracking, functional magnetic resonance imaging (fMRI), and magnetoencephalography (MEG). To assess the relationship between language models and cognitive data, we conducted a similarity-encoding analysis which decodes cognitive data based on its pattern similarity with textual embeddings. Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity. Specifically, context-aware models outperform context-independent models as language stimulus complexity increases. The shallow layers of context-aware models are better aligned with the high-temporal-resolution MEG signals whereas the deeper layers show more similarity with the high-spatial-resolution fMRI. These results indicate that language models have a delicate relationship with brain language representations. Moreover, the results between Chinese and English are highly consistent, suggesting the generalizability of these findings across languages.

Via

Access Paper or Ask Questions

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Oct 16, 2023

Chong Li, Shaonan Wang, Yunhao Zhang, Jiajun Zhang, Chengqing Zong

Figure 1 for Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Figure 2 for Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Figure 3 for Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Figure 4 for Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Abstract:Transformer-based models, even though achieving super-human performance on several downstream tasks, are often regarded as a black box and used as a whole. It is still unclear what mechanisms they have learned, especially their core module: multi-head attention. Inspired by functional specialization in the human brain, which helps to efficiently handle multiple tasks, this work attempts to figure out whether the multi-head attention module will evolve similar function separation under multi-tasking training. If it is, can this mechanism further improve the model performance? To investigate these questions, we introduce an interpreting method to quantify the degree of functional specialization in multi-head attention. We further propose a simple multi-task training method to increase functional specialization and mitigate negative information transfer in multi-task learning. Experimental results on seven pre-trained transformer models have demonstrated that multi-head attention does evolve functional specialization phenomenon after multi-task training which is affected by the similarity of tasks. Moreover, the multi-task training strategy based on functional specialization boosts performance in both multi-task learning and transfer learning without adding any parameters.

* Accepted to EMNLP 2023 main conference. Our code is available at https://github.com/ZNLP/FunctionalSpecializationInMHA

Via

Access Paper or Ask Questions

Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

May 26, 2023

Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens

Figure 1 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 2 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 3 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 4 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Abstract:Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.

* 17 pages, 6 figures, conference

Via

Access Paper or Ask Questions

StyleBERT: Chinese pretraining by font style information

Feb 23, 2022

Chao Lv, Han Zhang, XinKai Du, Yunhao Zhang, Ying Huang, Wenhao Li, Jia Han, Shanshan Gu

Figure 1 for StyleBERT: Chinese pretraining by font style information

Figure 2 for StyleBERT: Chinese pretraining by font style information

Figure 3 for StyleBERT: Chinese pretraining by font style information

Figure 4 for StyleBERT: Chinese pretraining by font style information

Abstract:With the success of down streaming task using English pre-trained language model, the pre-trained Chinese language model is also necessary to get a better performance of Chinese NLP task. Unlike the English language, Chinese has its special characters such as glyph information. So in this article, we propose the Chinese pre-trained language model StyleBERT which incorporate the following embedding information to enhance the savvy of language model, such as word, pinyin, five stroke and chaizi. The experiments show that the model achieves well performances on a wide range of Chinese NLP tasks.

Via

Access Paper or Ask Questions

Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Jan 31, 2021

Longyuan Li, Jihai Zhang, Junchi Yan, Yaohui Jin, Yunhao Zhang, Yanjie Duan, Guangjian Tian

Figure 1 for Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Figure 2 for Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Figure 3 for Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Figure 4 for Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

Abstract:Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.

* Accepted by AAAI 2021 conference

Via

Access Paper or Ask Questions