Abstract:Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.
Abstract:Event cameras offer low-latency and data compression for visual applications, through event-driven operation, that can be exploited for edge processing in tiny autonomous agents. Robust, accurate and low latency extraction of highly informative features such as corners is key for most visual processing. While several corner detection algorithms have been proposed, state-of-the-art performance is achieved by luvHarris. However, this algorithm requires a high number of memory accesses per event, making it less-than ideal for low-latency, low-energy implementation in tiny edge processors. In this paper, we propose a new event-driven corner detection implementation tailored for edge computing devices, which requires much lower memory access than luvHarris while also improving accuracy. Our method trades computation for memory access, which is more expensive for large memories. For a DAVIS346 camera, our method requires ~3.8X less memory, ~36.6X less memory accesses with only ~2.3X more computes.
Abstract:Neuromorphic computing relies on spike-based, energy-efficient communication, inherently implying the need for conversion between real-valued (sensory) data and binary, sparse spiking representation. This is usually accomplished using the real valued data as current input to a spiking neuron model, and tuning the neuron's parameters to match a desired, often biologically inspired behaviour. We developed a tool, the WaLiN-GUI, that supports the investigation of neuron models and parameter combinations to identify suitable configurations for neuron-based encoding of sample-based data into spike trains. Due to the generalized LIF model implemented by default, next to the LIF and Izhikevich neuron models, many spiking behaviors can be investigated out of the box, thus offering the possibility of tuning biologically plausible responses to the input data. The GUI is provided open source and with documentation, being easy to extend with further neuron models and personalize with data analysis functions.
Abstract:The field of neuromorphic computing holds great promise in terms of advancing computing efficiency and capabilities by following brain-inspired principles. However, the rich diversity of techniques employed in neuromorphic research has resulted in a lack of clear standards for benchmarking, hindering effective evaluation of the advantages and strengths of neuromorphic methods compared to traditional deep-learning-based methods. This paper presents a collaborative effort, bringing together members from academia and the industry, to define benchmarks for neuromorphic computing: NeuroBench. The goals of NeuroBench are to be a collaborative, fair, and representative benchmark suite developed by the community, for the community. In this paper, we discuss the challenges associated with benchmarking neuromorphic solutions, and outline the key features of NeuroBench. We believe that NeuroBench will be a significant step towards defining standards that can unify the goals of neuromorphic computing and drive its technological progress. Please visit neurobench.ai for the latest updates on the benchmark tasks and metrics.
Abstract:Prediction skills can be crucial for the success of tasks where robots have limited time to act or joints actuation power. In such a scenario, a vision system with a fixed, possibly too low, sampling rate could lead to the loss of informative points, slowing down prediction convergence and reducing the accuracy. In this paper, we propose to exploit the low latency, motion-driven sampling, and data compression properties of event cameras to overcome these issues. As a use-case, we use a Panda robotic arm to intercept a ball bouncing on a table. To predict the interception point, we adopt a Stateful LSTM network, a specific LSTM variant without fixed input length, which perfectly suits the event-driven paradigm and the problem at hand, where the length of the trajectory is not defined. We train the network in simulation to speed up the dataset acquisition and then fine-tune the models on real trajectories. Experimental results demonstrate how using a dense spatial sampling (i.e. event cameras) significantly increases the number of intercepted trajectories as compared to a fixed temporal sampling (i.e. frame-based cameras).
Abstract:In the brain, information is encoded, transmitted and used to inform behaviour at the level of timing of action potentials distributed over population of neurons. To implement neural-like systems in silico, to emulate neural function, and to interface successfully with the brain, neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain. To facilitate the cross-talk between neuromorphic engineering and neuroscience, in this Review we first critically examine and summarize emerging recent findings about how population of neurons encode and transmit information. We examine the effects on encoding and readout of information for different features of neural population activity, namely the sparseness of neural representations, the heterogeneity of neural properties, the correlations among neurons, and the time scales (from short to long) at which neurons encode information and maintain it consistently over time. Finally, we critically elaborate on how these facts constrain the design of information coding in neuromorphic circuits. We focus primarily on the implications for designing neuromorphic circuits that communicate with the brain, as in this case it is essential that artificial and biological neurons use compatible neural codes. However, we also discuss implications for the design of neuromorphic systems for implementation or emulation of neural computation.
Abstract:Spatio-temporal pattern recognition is a fundamental ability of the brain which is required for numerous real-world applications. Recent deep learning approaches have reached outstanding accuracy in such tasks, but their implementation on conventional embedded solutions is still very computationally and energy expensive. Tactile sensing in robotic applications is a representative example where real-time processing and energy-efficiency are required. Following a brain-inspired computing approach, we propose a new benchmark for spatio-temporal tactile pattern recognition at the edge through braille letters reading. We recorded a new braille letters dataset based on the capacitive tactile sensors/fingertip of the iCub robot, then we investigated the importance of temporal information and the impact of event-based encoding for spike-based/event-based computation. Afterwards, we trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihi neuromorphic chip for fast and efficient inference. We confronted our approach to standard classifiers, in particular to a Long Short-Term Memory (LSTM) deployed on the embedded Nvidia Jetson GPU in terms of classification accuracy, power/energy consumption and computational delay. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy-efficient than the LSTM on Jetson, requiring an average power of only 31mW. This work proposes a new benchmark for tactile sensing and highlights the challenges and opportunities of event-based encoding, neuromorphic hardware and spike-based computing for spatio-temporal pattern recognition at the edge.
Abstract:Low latency and accuracy are fundamental requirements when vision is integrated in robots for high-speed interaction with targets, since they affect system reliability and stability. In such a scenario, the choice of the sensor and algorithms is important for the entire control loop. The technology of event-cameras can guarantee fast visual sensing in dynamic environments, but requires a tracking algorithm that can keep up with the high data rate induced by the robot ego-motion while maintaining accuracy and robustness to distractors. In this paper, we introduce a novel tracking method that leverages the Exponential Reduced Ordinal Surface (EROS) data representation to decouple event-by-event processing and tracking computation. The latter is performed using convolution kernels to detect and follow a circular target moving on a plane. To benchmark state-of-the-art event-based tracking, we propose the task of tracking the air hockey puck sliding on a surface, with the future aim of controlling the iCub robot to reach the target precisely and on time. Experimental results demonstrate that our algorithm achieves the best compromise between low latency and tracking accuracy both when the robot is still and when moving.
Abstract:There have been a number of corner detection methods proposed for event cameras in the last years, since event-driven computer vision has become more accessible. Current state-of-the-art have either unsatisfactory accuracy or real-time performance when considered for practical use; random motion using a live camera in an unconstrained environment. In this paper, we present yet another method to perform corner detection, dubbed look-up event-Harris (luvHarris), that employs the Harris algorithm for high accuracy but manages an improved event throughput. Our method has two major contributions, 1. a novel "threshold ordinal event-surface" that removes certain tuning parameters and is well suited for Harris operations, and 2. an implementation of the Harris algorithm such that the computational load per-event is minimised and computational heavy convolutions are performed only 'as-fast-as-possible', i.e. only as computational resources are available. The result is a practical, real-time, and robust corner detector that runs more than $2.6\times$ the speed of current state-of-the-art; a necessity when using high-resolution event-camera in real-time. We explain the considerations taken for the approach, compare the algorithm to current state-of-the-art in terms of computational performance and detection accuracy, and discuss the validity of the proposed approach for event cameras.
Abstract:In this chapter we describe the history and evolution of the iCub humanoid platform. We start by describing the first version as it was designed during the RobotCub EU project and illustrate how it evolved to become the platform that is adopted by more than 30 laboratories world wide. We complete the chapter by illustrating some of the research activities that are currently carried out on the iCub robot, i.e. visual perception, event driven sensing and dynamic control. We conclude the Chapter with a discussion of the lessons we learned and a preview of the upcoming next release of the robot, iCub 3.0.