Abstract:Physical neural networks typically train linear synaptic weights while treating device nonlinearities as fixed. We show the opposite - by training the synaptic nonlinearity itself, as in Kolmogorov-Arnold Network (KAN) architectures, we yield markedly higher task performance per physical resource and improved performance-parameter scaling than conventional linear weight-based networks, demonstrating ability of KAN topologies to exploit reconfigurable nonlinear physical dynamics. We experimentally realise physical KANs in silicon-on-insulator devices we term 'Synaptic Nonlinear Elements' (SYNEs), operating at room temperature, microampere currents, 2 MHz speeds and ~250 fJ per nonlinear operation, with no observed degradation over 10^13 measurements and months-long timescales. We demonstrate nonlinear function regression, classification, and prediction of Li-Ion battery dynamics from noisy real-world multi-sensor data. Physical KANs outperform equivalently-parameterised software multilayer perceptron networks across all tasks, with up to two orders of magnitude fewer parameters, and two orders of magnitude fewer devices than linear weight based physical networks. These results establish learned physical nonlinearity as a hardware-native computational primitive for compact and efficient learning systems, and SYNE devices as effective substrates for heterogenous nonlinear computing.
Abstract:Physically implemented neural networks hold the potential to achieve the performance of deep learning models by exploiting the innate physical properties of devices as computational tools. This exploration of physical processes for computation requires to also consider their intrinsic dynamics, which can serve as valuable resources to process information. However, existing computational methods are unable to extend the success of deep learning techniques to parameters influencing device dynamics, which often lack a precise mathematical description. In this work, we formulate a universal framework to optimise interactions with dynamic physical systems in a fully data-driven fashion. The framework adopts neural stochastic differential equations as differentiable digital twins, effectively capturing both deterministic and stochastic behaviours of devices. Employing differentiation through the trained models provides the essential mathematical estimates for optimizing a physical neural network, harnessing the intrinsic temporal computation abilities of its physical nodes. To accurately model real devices' behaviours, we formulated neural-SDE variants that can operate under a variety of experimental settings. Our work demonstrates the framework's applicability through simulations and physical implementations of interacting dynamic devices, while highlighting the importance of accurately capturing system stochasticity for the successful deployment of a physically defined neural network.