Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessandro Betti

IMT Scuola Alti Studi, Lucca, Italy

Generative System Dynamics in Recurrent Neural Networks

Apr 16, 2025

Michele Casoni, Tommaso Guidi, Alessandro Betti, Stefano Melacci, Marco Gori

Abstract:In this study, we investigate the continuous time dynamics of Recurrent Neural Networks (RNNs), focusing on systems with nonlinear activation functions. The objective of this work is to identify conditions under which RNNs exhibit perpetual oscillatory behavior, without converging to static fixed points. We establish that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations. We further demonstrate that hyperbolic tangent-like activation functions (odd, bounded, and continuous) preserve these oscillatory dynamics by ensuring motion invariants in state space. Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process, mitigating those instabilities that are commonly associated with the forward Euler method. The experimental results of this analysis highlight practical considerations for designing neural architectures capable of capturing complex temporal dependencies, i.e., strategies for enhancing memorization skills in recurrent models.

Via

Access Paper or Ask Questions

A Unified Framework for Neural Computation and Learning Over Time

Sep 18, 2024

Stefano Melacci, Alessandro Betti, Michele Casoni, Tommaso Guidi, Matteo Tiezzi, Marco Gori

Abstract:This paper proposes Hamiltonian Learning, a novel unified framework for learning with neural networks "over time", i.e., from a possibly infinite stream of data, in an online manner, without having access to future information. Existing works focus on the simplified setting in which the stream has a known finite length or is segmented into smaller sequences, leveraging well-established learning strategies from statistical machine learning. In this paper, the problem of learning over time is rethought from scratch, leveraging tools from optimal control theory, which yield a unifying view of the temporal dynamics of neural computations and learning. Hamiltonian Learning is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives. The proposed framework is showcased by experimentally proving how it can recover gradient-based learning, comparing it to out-of-the box optimizers, and describing how it is flexible enough to switch from fully-local to partially/non-local computational schemes, possibly distributed over multiple devices, and BackPropagation without storing activations. Hamiltonian Learning is easy to implement and can help researches approach in a principled and innovative manner the problem of learning over time.

Via

Access Paper or Ask Questions

Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Sep 10, 2024

Jinwei Zhao, Marco Gori, Alessandro Betti, Stefano Melacci, Hongtao Zhang, Jiedong Liu, Xinhong Hei

Figure 1 for Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Figure 2 for Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Figure 3 for Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Figure 4 for Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Abstract:Gradient descent (GD) and stochastic gradient descent (SGD) have been widely used in a large number of application domains. Therefore, understanding the dynamics of GD and improving its convergence speed is still of great importance. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow. On the basis of the terminal sliding mode theory and the terminal attractor theory, four adaptive learning rates are designed. Their performances are investigated in light of a detailed theoretical investigation, and the running times of the learning procedures are evaluated and compared. The total times of their learning processes are also studied in detail. To evaluate their effectiveness, various simulation results are investigated on a function approximation problem and an image classification problem.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

Jun 13, 2024

Matteo Tiezzi, Michele Casoni, Alessandro Betti, Marco Gori, Stefano Melacci

Abstract:Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. From the dawn of Machine Learning, several researchers engaged in the search of algorithms and architectures capable of processing sequences of patterns, retaining information about the past inputs while still leveraging the upcoming data, without losing precious long-term dependencies and correlations. While such an ultimate goal is inspired by the human hallmark of continuous real-time processing of sensory information, several solutions simplified the learning paradigm by artificially limiting the processed context or dealing with sequences of limited length, given in advance. These solutions were further emphasized by the large ubiquity of Transformers, that have initially shaded the role of Recurrent Neural Nets. However, recurrent networks are facing a strong recent revival due to the growing popularity of (deep) State-Space models and novel instances of large-context Transformers, which are both based on recurrent computations to go beyond several limits of currently ubiquitous technologies. In fact, the fast development of Large Language Models enhanced the interest in efficient solutions to process data over time. This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. A complete taxonomy over the latest trends in architectural and algorithmic solutions is reported and discussed, guiding researchers in this appealing research field. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time, towards a more realistic scenario where patterns are effectively processed online, leveraging local-forward computations, opening to further research on this topic.

* Currently under review

Via

Access Paper or Ask Questions

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

Feb 14, 2024

Matteo Tiezzi, Michele Casoni, Alessandro Betti, Tommaso Guidi, Marco Gori, Stefano Melacci

Figure 1 for On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

Abstract:A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large Language Models) promotes the idea of parallel attention as the key to succeed in such a challenge, obfuscating the role of classic sequential processing of Recurrent Models. However, in the last few years, researchers who were concerned by the quadratic complexity of self-attention have been proposing a novel wave of neural models, which gets the best from the two worlds, i.e., Transformers and Recurrent Nets. Meanwhile, Deep Space-State Models emerged as robust approaches to function approximation over time, thus opening a new perspective in learning from sequential data, followed by many people in the field and exploited to implement a special class of (linear) Recurrent Neural Networks. This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. Moreover, it emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences whose length is known-in-advance for the more realistic setting of potentially infinite-length sequences, thus intersecting the field of lifelong-online learning from streamed data.

* Under review

Via

Access Paper or Ask Questions

Nature-Inspired Local Propagation

Feb 04, 2024

Alessandro Betti, Marco Gori

Figure 1 for Nature-Inspired Local Propagation

Figure 2 for Nature-Inspired Local Propagation

Figure 3 for Nature-Inspired Local Propagation

Figure 4 for Nature-Inspired Local Propagation

Abstract:The spectacular results achieved in machine learning, including the recent advances in generative AI, rely on large data collections. On the opposite, intelligent processes in nature arises without the need for such collections, but simply by online processing of the environmental information. In particular, natural learning processes rely on mechanisms where data representation and learning are intertwined in such a way to respect spatiotemporal locality. This paper shows that such a feature arises from a pre-algorithmic view of learning that is inspired by related studies in Theoretical Physics. We show that the algorithmic interpretation of the derived "laws of learning", which takes the structure of Hamiltonian equations, reduces to Backpropagation when the speed of propagation goes to infinity. This opens the doors to machine learning studies based on full on-line information processing that are based the replacement of Backpropagation with the proposed spatiotemporal local algorithm.

Via

Access Paper or Ask Questions

Neural Time-Reversed Generalized Riccati Equation

Dec 14, 2023

Alessandro Betti, Michele Casoni, Marco Gori, Simone Marullo, Stefano Melacci, Matteo Tiezzi

Figure 1 for Neural Time-Reversed Generalized Riccati Equation

Figure 2 for Neural Time-Reversed Generalized Riccati Equation

Figure 3 for Neural Time-Reversed Generalized Riccati Equation

Figure 4 for Neural Time-Reversed Generalized Riccati Equation

Abstract:Optimal control deals with optimization problems in which variables steer a dynamical system, and its outcome contributes to the objective function. Two classical approaches to solving these problems are Dynamic Programming and the Pontryagin Maximum Principle. In both approaches, Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates. However, Hamiltonian equations are rarely used due to their reliance on forward-backward algorithms across the entire temporal domain. This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time. Neural networks are employed not only for implementing state dynamics but also for estimating costate variables. The parameters of the latter network are determined at each time step using a newly introduced local policy referred to as the time-reversed generalized Riccati equation. This policy is inspired by a result discussed in the Linear Quadratic (LQ) problem, which we conjecture stabilizes state dynamics. We support this conjecture by discussing experimental results from a range of optimal control case studies.

Via

Access Paper or Ask Questions

PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Oct 17, 2022

Enrico Meloni, Lapo Faggi, Simone Marullo, Alessandro Betti, Matteo Tiezzi, Marco Gori, Stefano Melacci

Figure 1 for PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Figure 2 for PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Figure 3 for PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Figure 4 for PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Abstract:In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.

* 9 pages, accepted at International Conference on Machine Learning and Applications

Via

Access Paper or Ask Questions

Deep Learning to See: Towards New Foundations of Computer Vision

Jun 30, 2022

Alessandro Betti, Marco Gori, Stefano Melacci

Abstract:The remarkable progress in computer vision over the last few years is, by and large, attributed to deep learning, fueled by the availability of huge sets of labeled data, and paired with the explosive growth of the GPU paradigm. While subscribing to this view, this book criticizes the supposed scientific progress in the field and proposes the investigation of vision within the framework of information-based laws of nature. Specifically, the present work poses fundamental questions about vision that remain far from understood, leading the reader on a journey populated by novel challenges resonating with the foundations of machine learning. The central thesis is that for a deeper understanding of visual computational processes, it is necessary to look beyond the applications of general purpose machine learning algorithms and focus instead on appropriate learning theories that take into account the spatiotemporal nature of the visual signal.

Via

Access Paper or Ask Questions

Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Apr 26, 2022

Matteo Tiezzi, Simone Marullo, Lapo Faggi, Enrico Meloni, Alessandro Betti, Stefano Melacci

Figure 1 for Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Figure 2 for Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Figure 3 for Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Figure 4 for Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Abstract:Devising intelligent agents able to live in an environment and learn by observing the surroundings is a longstanding goal of Artificial Intelligence. From a bare Machine Learning perspective, challenges arise when the agent is prevented from leveraging large fully-annotated dataset, but rather the interactions with supervisory signals are sparsely distributed over space and time. This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream. The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations. Spatio-temporal stochastic coherence along the attention trajectory, paired with a contrastive term, leads to an unsupervised learning criterion that naturally copes with the considered setting. Differently from most existing works, the learned representations are used in open-set class-incremental classification of each frame pixel, relying on few supervisions. Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream. Inheriting features from state-of-the art models is not as powerful as one might expect.

* Accepted for publication in the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022)

Via

Access Paper or Ask Questions