Abstract:Restricted Boltzmann machines (RBM) are generative models capable to learn data with a rich underlying structure. We study the teacher-student setting where a student RBM learns structured data generated by a teacher RBM. The amount of structure in the data is controlled by adjusting the number of hidden units of the teacher and the correlations in the rows of the weights, a.k.a. patterns. In the absence of correlations, we validate the conjecture that the performance is independent of the number of teacher patters and hidden units of the student RBMs, and we argue that the teacher-student setting can be used as a toy model for studying the lottery ticket hypothesis. Beyond this regime, we find that the critical amount of data required to learn the teacher patterns decreases with both their number and correlations. In both regimes, we find that, even with an relatively large dataset, it becomes impossible to learn the teacher patterns if the inference temperature used for regularization is kept too low. In our framework, the student can learn teacher patterns one-to-one or many-to-one, generalizing previous findings about the teacher-student setting with two hidden units to any arbitrary finite number of hidden units.
Abstract:Dense Hopfield networks are known for their feature to prototype transition and adversarial robustness. However, previous theoretical studies have been mostly concerned with their storage capacity. We bridge this gap by studying the phase diagram of p-body Hopfield networks in the teacher-student setting of an unsupervised learning problem, uncovering ferromagnetic phases reminiscent of the prototype and feature learning regimes. On the Nishimori line, we find the critical size of the training set necessary for efficient pattern retrieval. Interestingly, we find that that the paramagnetic to ferromagnetic transition of the teacher-student setting coincides with the paramagnetic to spin-glass transition of the direct model, i.e. with random patterns. Outside of the Nishimori line, we investigate the learning performance in relation to the inference temperature and dataset noise. Moreover, we show that using a larger p for the student than the teacher gives the student an extensive tolerance to noise. We then derive a closed-form expression measuring the adversarial robustness of such a student at zero temperature, corroborating the positive correlation between number of parameters and robustness observed in large neural networks. We also use our model to clarify why the prototype phase of modern Hopfield networks is adversarially robust.
Abstract:Networks in machine learning offer examples of complex high-dimensional dynamical systems reminiscent of biological systems. Here, we study the learning dynamics of Generalized Hopfield networks, which permit a visualization of internal memories. These networks have been shown to proceed through a 'feature-to-prototype' transition, as the strength of network nonlinearity is increased, wherein the learned, or terminal, states of internal memories transition from mixed to pure states. Focusing on the prototype learning dynamics of the internal memories we observe a strong resemblance to the canalized, or low-dimensional, dynamics of cells as they differentiate within a Waddingtonian landscape. Dynamically, we demonstrate that learning in a Generalized Hopfield Network proceeds through sequential 'splits' in memory space. Furthermore, order of splitting is interpretable and reproducible. The dynamics between the splits are canalized in the Waddington sense -- robust to variations in detailed aspects of the system. In attempting to make the analogy a rigorous equivalence, we study smaller subsystems that exhibit similar properties to the full system. We combine analytical calculations with numerical simulations to study the dynamical emergence of the feature-to-prototype transition, and the behaviour of splits in the landscape, saddles points, visited during learning. We exhibit regimes where saddles appear and disappear through saddle-node bifurcations, qualitatively changing the distribution of learned memories as the strength of the nonlinearity is varied -- allowing us to systematically investigate the mechanisms that underlie the emergence of Waddingtonian dynamics. Memories can thus differentiate in a predictive and controlled way, revealing new bridges between experimental biology, dynamical systems theory, and machine learning.