Abstract:We show that many perception tasks, from visual recognition, semantic segmentation, optical flow, depth estimation to vocalization discrimination, are highly redundant functions of their input data. Images or spectrograms, projected into different subspaces, formed by orthogonal bases in pixel, Fourier or wavelet domains, can be used to solve these tasks remarkably well regardless of whether it is the top subspace where data varies the most, some intermediate subspace with moderate variability--or the bottom subspace where data varies the least. This phenomenon occurs because different subspaces have a large degree of redundant information relevant to the task.
Abstract:Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.
Abstract:This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high speed motion. As a representative scenario, we consider the case of a robot at rest reacting to a small, fast approaching object at speeds higher than 15m/s. Since conventional image sensors at typical frame rates observe such an object for only a few frames, estimating the underlying motion presents a considerable challenge for standard computer vision systems and algorithms. In this paper we present a method motivated by how animals such as insects solve this problem with their relatively simple vision systems. Our solution takes the event stream from a DVS and first encodes the temporal events with a set of causal exponential filters across multiple time scales. We couple these filters with a Convolutional Neural Network (CNN) to efficiently extract relevant spatiotemporal features. The combined network learns to output both the expected time to collision of the object, as well as the predicted collision point on a discretized polar grid. These critical estimates are computed with minimal delay by the network in order to react appropriately to the incoming object. We highlight the results of our system to a toy dart moving at 23.4m/s with a 24.73{\deg} error in ${\theta}$, 18.4mm average discretized radius prediction error, and 25.03% median time to collision prediction error.
Abstract:This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs). We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm. Our system is composed of (i) a near-chip event filter that compresses and denoises the event stream from the DVS, and (ii) a Binary Neural Network (BNN) detection module that runs on a low-computation edge computing device (in our case a STM32F4 microcontroller). We present the system architecture and provide an end-to-end implementation for pedestrian detection in an office environment. Our implementation reduces transmission size by up to 99.6% compared to transmitting the raw event stream. The average packet size in our system is only 1397 bits, while 307.2 kb are required to send an uncompressed DVS time window. Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%. The low bandwidth and energy properties of our system make it ideal for IoT applications.