Abstract:Recent advances in reinforcement learning (RL) have led to significant improvements in task performance. However, training neural networks in an RL regime is typically achieved in combination with backpropagation, limiting their applicability in resource-constrained environments or when using non-differentiable neural networks. While noise-based alternatives like reward-modulated Hebbian learning (RMHL) have been proposed, their performance has remained limited, especially in scenarios with delayed rewards, which require retrospective credit assignment over time. Here, we derive a novel noise-based learning rule that addresses these challenges. Our approach combines directional derivative theory with Hebbian-like updates to enable efficient, gradient-free learning in RL. It features stochastic noisy neurons which can approximate gradients, and produces local synaptic updates modulated by a global reward signal. Drawing on concepts from neuroscience, our method uses reward prediction error as its optimization target to generate increasingly advantageous behavior, and incorporates an eligibility trace to facilitate temporal credit assignment in environments with delayed rewards. Its formulation relies on local information alone, making it compatible with implementations in neuromorphic hardware. Experimental validation shows that our approach significantly outperforms RMHL and is competitive with BP-based baselines, highlighting the promise of noise-based, biologically inspired learning for low-power and real-time applications.
Abstract:Due to the development of deep learning techniques applied to satellite imagery, weather forecasting that uses remote sensing data has also been the subject of major progress. The present paper investigates multiple steps ahead frame prediction for coastal sea elements in the Netherlands using U-Net based architectures. Hourly data from the Copernicus observation programme spanned over a period of 2 years has been used to train the models and make the forecasting, including seasonal predictions. We propose a variation of the U-Net architecture and also extend this novel model using residual connections, parallel convolutions and asymmetric convolutions in order to propose three additional architectures. In particular, we show that the architecture equipped with parallel and asymmetric convolutions as well as skip connections is particularly suited for this task, outperforming the other three discussed models.