Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anthony Bisulco

Many Perception Tasks are Highly Redundant Functions of their Input Data

Jul 18, 2024

Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei, Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari

Figure 1 for Many Perception Tasks are Highly Redundant Functions of their Input Data

Figure 2 for Many Perception Tasks are Highly Redundant Functions of their Input Data

Figure 3 for Many Perception Tasks are Highly Redundant Functions of their Input Data

Figure 4 for Many Perception Tasks are Highly Redundant Functions of their Input Data

Abstract:We show that many perception tasks, from visual recognition, semantic segmentation, optical flow, depth estimation to vocalization discrimination, are highly redundant functions of their input data. Images or spectrograms, projected into different subspaces, formed by orthogonal bases in pixel, Fourier or wavelet domains, can be used to solve these tasks remarkably well regardless of whether it is the top subspace where data varies the most, some intermediate subspace with moderate variability--or the bottom subspace where data varies the least. This phenomenon occurs because different subspaces have a large degree of redundant information relevant to the task.

Via

Access Paper or Ask Questions

EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Apr 14, 2023

Ziyun Wang, Fernando Cladera Ojeda, Anthony Bisulco, Daewon Lee, Camillo J. Taylor, Kostas Daniilidis, M. Ani Hsieh, Daniel D. Lee, Volkan Isler

Figure 1 for EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Figure 2 for EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Figure 3 for EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Figure 4 for EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

Abstract:Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.

* 8 pages, 6 figures, IEEE Robotics and Automation Letters ( Volume: 7, Issue: 4, October 2022)

Via

Access Paper or Ask Questions

Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Nov 18, 2020

Anthony Bisulco, Fernando Cladera Ojeda, Volkan Isler, Daniel D. Lee

Figure 1 for Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Figure 2 for Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Figure 3 for Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Figure 4 for Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Abstract:This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high speed motion. As a representative scenario, we consider the case of a robot at rest reacting to a small, fast approaching object at speeds higher than 15m/s. Since conventional image sensors at typical frame rates observe such an object for only a few frames, estimating the underlying motion presents a considerable challenge for standard computer vision systems and algorithms. In this paper we present a method motivated by how animals such as insects solve this problem with their relatively simple vision systems. Our solution takes the event stream from a DVS and first encodes the temporal events with a set of causal exponential filters across multiple time scales. We couple these filters with a Convolutional Neural Network (CNN) to efficiently extract relevant spatiotemporal features. The combined network learns to output both the expected time to collision of the object, as well as the predicted collision point on a discretized polar grid. These critical estimates are computed with minimal delay by the network in order to react appropriately to the incoming object. We highlight the results of our system to a toy dart moving at 23.4m/s with a 24.73{\deg} error in ${\theta}$, 18.4mm average discretized radius prediction error, and 25.03% median time to collision prediction error.

Via

Access Paper or Ask Questions

Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection

Apr 03, 2020

Anthony Bisulco, Fernando Cladera Ojeda, Volkan Isler, Daniel D. Lee

Figure 1 for Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection

Figure 2 for Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection

Figure 3 for Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection

Figure 4 for Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection

Abstract:This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs). We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm. Our system is composed of (i) a near-chip event filter that compresses and denoises the event stream from the DVS, and (ii) a Binary Neural Network (BNN) detection module that runs on a low-computation edge computing device (in our case a STM32F4 microcontroller). We present the system architecture and provide an end-to-end implementation for pedestrian detection in an office environment. Our implementation reduces transmission size by up to 99.6% compared to transmitting the raw event stream. The average packet size in our system is only 1397 bits, while 307.2 kb are required to send an uncompressed DVS time window. Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%. The low bandwidth and energy properties of our system make it ideal for IoT applications.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions