Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Istvan Gyongy

Joint Depth and Reflectivity Estimation using Single-Photon LiDAR

May 19, 2025

Hashan K. Weerasooriya, Prateek Chennuri, Weijian Zhang, Istvan Gyongy, Stanley H. Chan

Abstract:Single-Photon Light Detection and Ranging (SP-LiDAR is emerging as a leading technology for long-range, high-precision 3D vision tasks. In SP-LiDAR, timestamps encode two complementary pieces of information: pulse travel time (depth) and the number of photons reflected by the object (reflectivity). Existing SP-LiDAR reconstruction methods typically recover depth and reflectivity separately or sequentially use one modality to estimate the other. Moreover, the conventional 3D histogram construction is effective mainly for slow-moving or stationary scenes. In dynamic scenes, however, it is more efficient and effective to directly process the timestamps. In this paper, we introduce an estimation method to simultaneously recover both depth and reflectivity in fast-moving scenes. We offer two contributions: (1) A theoretical analysis demonstrating the mutual correlation between depth and reflectivity and the conditions under which joint estimation becomes beneficial. (2) A novel reconstruction method, "SPLiDER", which exploits the shared information to enhance signal recovery. On both synthetic and real SP-LiDAR data, our method outperforms existing approaches, achieving superior joint reconstruction quality.

Via

Access Paper or Ask Questions

Quanta Video Restoration

Oct 19, 2024

Prateek Chennuri, Yiheng Chi, Enze Jiang, G. M. Dilshan Godaliyadda, Abhiram Gnanasambandam, Hamid R. Sheikh, Istvan Gyongy, Stanley H. Chan

Abstract:The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when the number of input frames is low. In this paper, we introduce Quanta Video Restoration (QUIVER), an end-to-end trainable network built on the core ideas of classical quanta restoration methods, i.e., pre-filtering, flow estimation, fusion, and refinement. We also collect and publish I2-2000FPS, a high-speed video dataset with the highest temporal resolution of 2000 frames-per-second, for training and testing. On simulated and real data, QUIVER outperforms existing quanta restoration methods by a significant margin. Code and dataset available at https://github.com/chennuriprateek/Quanta_Video_Restoration-QUIVER-

* European Conference on Computer Vision (ECCV) 2024

Via

Access Paper or Ask Questions

Resolution Limit of Single-Photon LiDAR

Mar 31, 2024

Stanley H. Chan, Hashan K. Weerasooriya, Weijian Zhang, Pamela Abshire, Istvan Gyongy, Robert K. Henderson

Figure 1 for Resolution Limit of Single-Photon LiDAR

Figure 2 for Resolution Limit of Single-Photon LiDAR

Figure 3 for Resolution Limit of Single-Photon LiDAR

Figure 4 for Resolution Limit of Single-Photon LiDAR

Abstract:Single-photon Light Detection and Ranging (LiDAR) systems are often equipped with an array of detectors for improved spatial resolution and sensing speed. However, given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spatial resolution of the sensor array and the SNR received at each pixel. Theoretical characterization of this fundamental limit is explored. By deriving the photon arrival statistics and introducing a series of new approximation techniques, the Mean Squared Error (MSE) of the maximum-likelihood estimator of the time delay is derived. The theoretical predictions align well with simulations and real data.

Via

Access Paper or Ask Questions

TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks

Jan 19, 2024

Jack MacLean, Brian Stewart, Istvan Gyongy

Figure 1 for TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks

Figure 2 for TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks

Figure 3 for TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks

Figure 4 for TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks

Abstract:3D depth sensors using single-photon avalanche diodes (SPADs) are becoming increasingly common in applications such as autonomous navigation and object detection. Recent designs implement on-chip histogramming time-to-digital converters (TDCs) to compress the photon timestamps and reduce the bottleneck in the read-out and processing of large volumes of photon data. However, the use of full histogramming with large SPAD arrays poses significant challenges due to the associated demands in silicon area and power consumption. We propose a TDC-less dToF sensor which uses Spiking Neural Networks (SNN) to process the SPAD events directly. The proposed SNN is trained and tested on synthetic SPAD events, and while it offers five times lower precision in depth prediction than a classic centre-of-mass (CoM) algorithm (applied to histograms of the events), it achieves similar Mean Absolute Error (MAE) with faster processing speeds and significantly lower power consumption is anticipated.

* 7 Pages, 9 Figures

Via

Access Paper or Ask Questions

Video super-resolution for single-photon LIDAR

Oct 19, 2022

Germán Mora Martín, Stirling Scholes, Alice Ruget, Robert K. Henderson, Jonathan Leach, Istvan Gyongy

Figure 1 for Video super-resolution for single-photon LIDAR

Figure 2 for Video super-resolution for single-photon LIDAR

Figure 3 for Video super-resolution for single-photon LIDAR

Figure 4 for Video super-resolution for single-photon LIDAR

Abstract:3D Time-of-Flight (ToF) image sensors are used widely in applications such as self-driving cars, Augmented Reality (AR) and robotics. When implemented with Single-Photon Avalanche Diodes (SPADs), compact, array format sensors can be made that offer accurate depth maps over long distances, without the need for mechanical scanning. However, array sizes tend to be small, leading to low lateral resolution, which combined with low Signal-to-Noise Ratio (SNR) levels under high ambient illumination, may lead to difficulties in scene interpretation. In this paper, we use synthetic depth sequences to train a 3D Convolutional Neural Network (CNN) for denoising and upscaling (x4) depth data. Experimental results, based on synthetic as well as real ToF data, are used to demonstrate the effectiveness of the scheme. With GPU acceleration, frames are processed at >30 frames per second, making the approach suitable for low-latency imaging, as required for obstacle avoidance.

* 18 pages, 10 figures, 3 tables

Via

Access Paper or Ask Questions

Simulating single-photon detector array sensors for depth imaging

Oct 07, 2022

Stirling Scholes, Germán Mora-Martín, Feng Zhu, Istvan Gyongy, Phil Soan, Jonathan Leach

Figure 1 for Simulating single-photon detector array sensors for depth imaging

Figure 2 for Simulating single-photon detector array sensors for depth imaging

Figure 3 for Simulating single-photon detector array sensors for depth imaging

Figure 4 for Simulating single-photon detector array sensors for depth imaging

Abstract:Single-Photon Avalanche Detector (SPAD) arrays are a rapidly emerging technology. These multi-pixel sensors have single-photon sensitivities and pico-second temporal resolutions thus they can rapidly generate depth images with millimeter precision. Such sensors are a key enabling technology for future autonomous systems as they provide guidance and situational awareness. However, to fully exploit the capabilities of SPAD array sensors, it is crucial to establish the quality of depth images they are able to generate in a wide range of scenarios. Given a particular optical system and a finite image acquisition time, what is the best-case depth resolution and what are realistic images generated by SPAD arrays? In this work, we establish a robust yet simple numerical procedure that rapidly establishes the fundamental limits to depth imaging with SPAD arrays under real world conditions. Our approach accurately generates realistic depth images in a wide range of scenarios, allowing the performance of an optical depth imaging system to be established without the need for costly and laborious field testing. This procedure has applications in object detection and tracking for autonomous systems and could be easily extended to systems for underwater imaging or for imaging around corners.

Via

Access Paper or Ask Questions

A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

Sep 23, 2022

Istvan Gyongy, Ahmet T. Erdogan, Neale A. W. Dutton, Germán Mora Martín, Alistair Gorman, Hanning Mai, Francesco Mattioli Della Rocca, Robert K. Henderson

Figure 1 for A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

Figure 2 for A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

Figure 3 for A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

Figure 4 for A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

Abstract:3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor, and free of moving parts, for applications such as self-driving cars, robotics and augmented reality (AR). Typically implemented using single-photon, direct time-of-flight (dToF) receivers in image sensor format, the operation of the devices can be hindered by the large number of photon events needing to be processed and compressed in outdoor scenarios, limiting frame rates and scalability to larger arrays. We here present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming, which lock onto and track the return signal. This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings. The sensor offers selective readout of pixels detecting surfaces, or those sensing motion, leading to reduced power consumption and off-chip processing requirements. We demonstrate the application of the sensor in mid-range LIDAR.

* 24 pages, 16 figures. The visualisations may be viewed by clicking on the hyperlinks in the text

Via

Access Paper or Ask Questions

DronePose: The identification, segmentation, and orientation detection of drones via neural networks

Dec 10, 2021

Stirling Scholes, Alice Ruget, German Mora-Martin, Feng Zhu, Istvan Gyongy, Jonathan Leach

Figure 1 for DronePose: The identification, segmentation, and orientation detection of drones via neural networks

Figure 2 for DronePose: The identification, segmentation, and orientation detection of drones via neural networks

Figure 3 for DronePose: The identification, segmentation, and orientation detection of drones via neural networks

Figure 4 for DronePose: The identification, segmentation, and orientation detection of drones via neural networks

Abstract:The growing ubiquity of drones has raised concerns over the ability of traditional air-space monitoring technologies to accurately characterise such vehicles. Here, we present a CNN using a decision tree and ensemble structure to fully characterise drones in flight. Our system determines the drone type, orientation (in terms of pitch, roll, and yaw), and performs segmentation to classify different body parts (engines, body, and camera). We also provide a computer model for the rapid generation of large quantities of accurately labelled photo-realistic training data and demonstrate that this data is of sufficient fidelity to allow the system to accurately characterise real drones in flight. Our network will provide a valuable tool in the image processing chain where it may build upon existing drone detection technologies to provide complete drone characterisation over wide areas.

Via

Access Paper or Ask Questions

Real-time, low-cost multi-person 3D pose estimation

Oct 11, 2021

Alice Ruget, Max Tyler, Germán Mora Martín, Stirling Scholes, Feng Zhu, Istvan Gyongy, Brent Hearn, Steve McLaughlin, Abderrahim Halimi, Jonathan Leach

Figure 1 for Real-time, low-cost multi-person 3D pose estimation

Figure 2 for Real-time, low-cost multi-person 3D pose estimation

Figure 3 for Real-time, low-cost multi-person 3D pose estimation

Figure 4 for Real-time, low-cost multi-person 3D pose estimation

Abstract:The process of tracking human anatomy in computer vision is referred to pose estimation, and it is used in fields ranging from gaming to surveillance. Three-dimensional pose estimation traditionally requires advanced equipment, such as multiple linked intensity cameras or high-resolution time-of-flight cameras to produce depth images. However, there are applications, e.g.~consumer electronics, where significant constraints are placed on the size, power consumption, weight and cost of the usable technology. Here, we demonstrate that computational imaging methods can achieve accurate pose estimation and overcome the apparent limitations of time-of-flight sensors designed for much simpler tasks. The sensor we use is already widely integrated in consumer-grade mobile devices, and despite its low spatial resolution, only 4$\times$4 pixels, our proposed Pixels2Pose system transforms its data into accurate depth maps and 3D pose data of multiple people up to a distance of 3 m from the sensor. We are able to generate depth maps at a resolution of 32$\times$32 and 3D localization of a body parts with an error of only $\approx$10 cm at a frame rate of 7 fps. This work opens up promising real-life applications in scenarios that were previously restricted by the advanced hardware requirements and cost of time-of-flight technology.

Via

Access Paper or Ask Questions

High-speed object detection with a single-photon time-of-flight image sensor

Jul 28, 2021

Germán Mora-Martín, Alex Turpin, Alice Ruget, Abderrahim Halimi, Robert Henderson, Jonathan Leach, Istvan Gyongy

Figure 1 for High-speed object detection with a single-photon time-of-flight image sensor

Figure 2 for High-speed object detection with a single-photon time-of-flight image sensor

Figure 3 for High-speed object detection with a single-photon time-of-flight image sensor

Figure 4 for High-speed object detection with a single-photon time-of-flight image sensor

Abstract:3D time-of-flight (ToF) imaging is used in a variety of applications such as augmented reality (AR), computer interfaces, robotics and autonomous systems. Single-photon avalanche diodes (SPADs) are one of the enabling technologies providing accurate depth data even over long ranges. By developing SPADs in array format with integrated processing combined with pulsed, flood-type illumination, high-speed 3D capture is possible. However, array sizes tend to be relatively small, limiting the lateral resolution of the resulting depth maps, and, consequently, the information that can be extracted from the image for applications such as object detection. In this paper, we demonstrate that these limitations can be overcome through the use of convolutional neural networks (CNNs) for high-performance object detection. We present outdoor results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64x32 spatial resolution. The results, obtained with exposure times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR) ratios as low as 0.05, point to the advantages of providing the CNN with full histogram data rather than point clouds alone. Alternatively, a combination of point cloud and active intensity data may be used as input, for a similar level of performance. In either case, the GPU-accelerated processing time is less than 1 ms per frame, leading to an overall latency (image acquisition plus processing) in the millisecond range, making the results relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.

* 13 pages, 5 figures, 3 tables

Via

Access Paper or Ask Questions