Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chan Li

Meta predictive learning model of natural languages

Sep 08, 2023

Chan Li, Junbin Qiu, Haiping Huang

Figure 1 for Meta predictive learning model of natural languages

Figure 2 for Meta predictive learning model of natural languages

Figure 3 for Meta predictive learning model of natural languages

Figure 4 for Meta predictive learning model of natural languages

Abstract:Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the physics and biology correspondences of the language processing and the unexpected general intelligence.

* 23 pages, 6 figures, codes are available in the main text with the link

Via

Access Paper or Ask Questions

Statistical mechanics of continual learning: variational principle and mean-field potential

Dec 07, 2022

Chan Li, Zhenye Huang, Wenxuan Zou, Haiping Huang

Abstract:An obstacle to artificial general intelligence is set by the continual learning of multiple tasks of different nature. Recently, various heuristic tricks, both from machine learning and from neuroscience angles, were proposed, but they lack a unified theory ground. Here, we focus on the continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learning setting is thus proposed, where the neural network is trained in a field-space, rather than the gradient-ill-defined discrete-weight space, and furthermore, the weight uncertainty is naturally incorporated, and modulates the synaptic resources among tasks. From a physics perspective, we translate the variational continual learning into the Franz-Parisi thermodynamic potential framework, where the previous task knowledge acts as a prior and a reference as well. Therefore, the learning performance can be analytically studied with mean-field order parameters, whose predictions coincide with the numerical experiments using stochastic gradient descent methods. Our proposed principled frameworks also connect to elastic weight consolidation, and neuroscience inspired metaplasticity, providing a theory-grounded method for the real-world multi-task learning with deep networks.

* 45 pages, 7 figures

Via

Access Paper or Ask Questions

Emergence of hierarchical modes from deep learning

Aug 21, 2022

Chan Li, Haiping Huang

Figure 1 for Emergence of hierarchical modes from deep learning

Figure 2 for Emergence of hierarchical modes from deep learning

Figure 3 for Emergence of hierarchical modes from deep learning

Abstract:Large-scale deep neural networks consume expensive training costs, but the training results in less-interpretable weight matrices constructing the networks. Here, we propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. These modes are akin to patterns in physics studies of memory networks. The mode decomposition learning not only saves a significant large amount of training costs, but also explains the network performance with the leading modes. The mode learning scheme shows a progressively compact latent space across the network hierarchy, and the least number of modes increases only logarithmically with the network width. Our mode decomposition learning is also studied in an analytic on-line learning setting, which reveals multi-stage of learning dynamics. Therefore, the proposed mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.

* 5 pages, 4 figures, and SM is available upon request

Via

Access Paper or Ask Questions

Ensemble perspective for understanding temporal credit assignment

Feb 07, 2021

Wenxuan Zou, Chan Li, Haiping Huang

Figure 1 for Ensemble perspective for understanding temporal credit assignment

Figure 2 for Ensemble perspective for understanding temporal credit assignment

Figure 3 for Ensemble perspective for understanding temporal credit assignment

Figure 4 for Ensemble perspective for understanding temporal credit assignment

Abstract:Recurrent neural networks are widely used for modeling spatio-temporal sequences in both nature language processing and neural population dynamics. However, understanding the temporal credit assignment is hard. Here, we propose that each individual connection in the recurrent computation is modeled by a spike and slab distribution, rather than a precise weight value. We then derive the mean-field algorithm to train the network at the ensemble level. The method is then applied to classify handwritten digits when pixels are read in sequence, and to the multisensory integration task that is a fundamental cognitive function of animals. Our model reveals important connections that determine the overall performance of the network. The model also shows how spatio-temporal information is processed through the hyperparameters of the distribution, and moreover reveals distinct types of emergent neural selectivity. It is thus promising to study the temporal credit assignment in recurrent neural networks from the ensemble perspective.

* 17 pages, 18 figures, comments are welcome

Via

Access Paper or Ask Questions

Learning credit assignment

Jan 10, 2020

Chan Li, Haiping Huang

Abstract:Deep learning has achieved impressive prediction accuracies in a variety of scientific and industrial domains. However, the nested non-linear feature of deep learning makes the learning highly non-transparent, i.e., it is still unknown how the learning coordinates a huge number of parameters to achieve a decision making. To explain this hierarchical credit assignment, we propose a mean-field learning model by assuming that an ensemble of sub-networks, rather than a single network, are trained for a classification task. Surprisingly, our model reveals that apart from some deterministic synaptic weights connecting two neurons at neighboring layers, there exist a large number of connections that can be absent, and other connections can allow for a broad distribution of their weight values. Therefore, synaptic connections can be classified into three categories: very important ones, unimportant ones, and those of variability that may partially encode nuisance factors. Therefore, our model learns the credit assignment leading to the decision, and predicts an ensemble of sub-networks that can accomplish the same task, thereby providing insights toward understanding the macroscopic behavior of deep learning through the lens of distinct roles of synaptic weights.

* 5 pages, 3 figures, a generalized BackProp proposed to learn credit assignment from an network ensemble perspective

Via

Access Paper or Ask Questions