Abstract:In this manuscript I present an analysis on the performance of OpenAI O1-preview model in solving random K-SAT instances for K$\in {2,3,4}$ as a function of $\alpha=M/N$ where $M$ is the number of clauses and $N$ is the number of variables of the satisfiable problem. I show that the model can call an external SAT solver to solve the instances, rather than solving them directly. Despite using external solvers, the model reports incorrect assignments as output. Moreover, I propose and present an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.
Abstract:The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. By incorporating stable attractors into such a metapopulation model's dynamics, we transform it into a learning algorithm capable of achieving high image and text classification accuracy. We test it on MNIST and Fashion MNIST, in combination with convolutional neural networks, on CIFAR-10 and TF-FLOWERS, and, in combination with a transformer architecture (BERT), on IMDB, always showing high classification accuracy. These numerical evaluations illustrate that minimal modifications to the Wilson-Cowan model for metapopulation can reveal unique and previously unobserved dynamics.
Abstract:Working with high-dimensional data is a common practice, in the field of machine learning. Identifying relevant input features is thus crucial, so as to obtain compact dataset more prone for effective numerical handling. Further, by isolating pivotal elements that form the basis of decision making, one can contribute to elaborate on - ex post - models' interpretability, so far rather elusive. Here, we propose a novel method to estimate the relative importance of the input components for a Deep Neural Network. This is achieved by leveraging on a spectral re-parametrization of the optimization process. Eigenvalues associated to input nodes provide in fact a robust proxy to gauge the relevance of the supplied entry features. Unlike existing techniques, the spectral features ranking is carried out automatically, as a byproduct of the network training. The technique is successfully challenged against both synthetic and real data.
Abstract:A substantial gap persists in understanding the reasons behind the exceptional performance of the Transformer architecture in NLP. A particularly unexplored area involves the mechanistic description of how the distribution of parameters evolves over time during training. In this work we suggest that looking at the time evolution of the statistic distribution of model parameters, and specifically at bifurcation effects, can help understanding the model quality, potentially reducing training costs and evaluation efforts and empirically showing the reasons behind the effectiveness of weights sparsification.
Abstract:This manuscript provides a comprehensive review of the Maximum Clique Problem, a computational problem that involves finding subsets of vertices in a graph that are all pairwise adjacent to each other. The manuscript covers in a simple way classical algorithms for solving the problem and includes a review of recent developments in graph neural networks and quantum algorithms. The review concludes with benchmarks for testing classical as well as new learning, and quantum algorithms.
Abstract:EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.
Abstract:This paper presents a novel approach to advancing artificial intelligence (AI) through the development of the Complex Recurrent Spectral Network ($\mathbb{C}$-RSN), an innovative variant of the Recurrent Spectral Network (RSN) model. The $\mathbb{C}$-RSN is designed to address a critical limitation in existing neural network models: their inability to emulate the complex processes of biological neural networks dynamically and accurately. By integrating key concepts from dynamical systems theory and leveraging principles from statistical mechanics, the $\mathbb{C}$-RSN model introduces localized non-linearity, complex fixed eigenvalues, and a distinct separation of memory and input processing functionalities. These features collectively enable the $\mathbb{C}$-RSN evolving towards a dynamic, oscillating final state that more closely mirrors biological cognition. Central to this work is the exploration of how the $\mathbb{C}$-RSN manages to capture the rhythmic, oscillatory dynamics intrinsic to biological systems, thanks to its complex eigenvalue structure and the innovative segregation of its linear and non-linear components. The model's ability to classify data through a time-dependent function, and the localization of information processing, is demonstrated with an empirical evaluation using the MNIST dataset. Remarkably, distinct items supplied as a sequential input yield patterns in time which bear the indirect imprint of the insertion order (and of the time of separation between contiguous insertions).
Abstract:In a world increasingly reliant on machine learning, the interpretability of these models remains a substantial challenge, with many equating their functionality to an enigmatic black box. This study seeks to bridge machine learning and dynamical systems. Recognizing the deep parallels between dense neural networks and dynamical systems, particularly in the light of non-linearities and successive transformations, this manuscript introduces the Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs). Uniquely designed as neural networks underpinned by continuous ordinary differential equations, EODECAs aim to capitalize on the well-established toolkit of dynamical systems. Unlike traditional deep learning models, which often suffer from opacity, EODECAs promise both high classification performance and intrinsic interpretability. They are naturally invertible, granting them an edge in understanding and transparency over their counterparts. By bridging these domains, we hope to usher in a new era of machine learning models where genuine comprehension of data processes complements predictive prowess.
Abstract:Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
Abstract:The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work presents a systematic attempt at understanding the role of the mini-batch size in training two-layer neural networks. Working in the teacher-student scenario, with a sparse teacher, and focusing on tasks of different complexity, we quantify the effects of changing the mini-batch size $m$. We find that often the generalization performances of the student strongly depend on $m$ and may undergo sharp phase transitions at a critical value $m_c$, such that for $m<m_c$ the training process fails, while for $m>m_c$ the student learns perfectly or generalizes very well the teacher. Phase transitions are induced by collective phenomena firstly discovered in statistical mechanics and later observed in many fields of science. Finding a phase transition varying the mini-batch size raises several important questions on the role of a hyperparameter which have been somehow overlooked until now.