Abstract:Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual sensing with sparse, asynchronous output that is naturally compatible with neuromorphic processing. Yet no prior system has deployed a continual on-device learning pipeline for event-based action recognition using neuromorphic hardware. We present CLANE, Continual Learning of Actions on Neuromorphic Hardware from Event Cameras, deployed end-to-end on Intel Loihi 2. CLANE combines a spiking 2D CNN for spatiotemporal feature extraction with CLP-SNN as its on-chip learning head, extended to action clips via a Temporal Aggregation Layer and a fixed-point Normalization Layer, both novel Loihi 2 modules. On THU E-ACT-50, a 50-class dataset captured under real-world conditions, CLANE achieves 70.4% accuracy in a continual learning task while delivering more than 100x energy reduction and 16x lower latency over a sequential CNN+GRU+CLP edge GPU baseline, validated through iso-algorithm cross-platform benchmarking across three evaluation levels.




Abstract:We propose an approximation of Echo State Networks (ESN) that can be efficiently implemented on digital hardware based on the mathematics of hyperdimensional computing. The reservoir of the proposed Integer Echo State Network (intESN) is a vector containing only n-bits integers (where n<8 is normally sufficient for a satisfactory performance). The recurrent matrix multiplication is replaced with an efficient cyclic shift operation. The intESN architecture is verified with typical tasks in reservoir computing: memorizing of a sequence of inputs; classifying time-series; learning dynamic processes. Such an architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss.