Abstract:This paper presents an autonomous method to address challenges arising from severe lighting conditions in machine vision applications that use event cameras. To manage these conditions, the research explores the built in potential of these cameras to adjust pixel functionality, named bias settings. As cars are driven at various times and locations, shifts in lighting conditions are unavoidable. Consequently, this paper utilizes the neuromorphic YOLO-based face tracking module of a driver monitoring system as the event-based application to study. The proposed method uses numerical metrics to continuously monitor the performance of the event-based application in real-time. When the application malfunctions, the system detects this through a drop in the metrics and automatically adjusts the event cameras bias values. The Nelder-Mead simplex algorithm is employed to optimize this adjustment, with finetuning continuing until performance returns to a satisfactory level. The advantage of bias optimization lies in its ability to handle conditions such as flickering or darkness without requiring additional hardware or software. To demonstrate the capabilities of the proposed system, it was tested under conditions where detecting human faces with default bias values was impossible. These severe conditions were simulated using dim ambient light and various flickering frequencies. Following the automatic and dynamic process of bias modification, the metrics for face detection significantly improved under all conditions. Autobiasing resulted in an increase in the YOLO confidence indicators by more than 33 percent for object detection and 37 percent for face detection highlighting the effectiveness of the proposed method.
Abstract:Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed ``events''. This distinct data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects, thereby preserving critical information that might otherwise be lost. However, leveraging this data often necessitates the development of specialized, handcrafted event representations that can integrate seamlessly with conventional Convolutional Neural Networks (CNNs), considering the unique attributes of event data. In this study, We evaluate event-based Face and Eye tracking. The core objective of our study is to showcase the viability of integrating conventional algorithms with event-based data, transformed into a frame format while preserving the unique benefits of event cameras. To validate our approach, we constructed a frame-based event dataset by simulating events between RGB frames derived from the publicly accessible Helen Dataset. We assess its utility for face and eye detection tasks through the application of GR-YOLO -- a pioneering technique derived from YOLOv3. This evaluation includes a comparative analysis with results derived from training the dataset with YOLOv8. Subsequently, the trained models were tested on real event streams from various iterations of Prophesee's event cameras and further evaluated on the Faces in Event Stream (FES) benchmark dataset. The models trained on our dataset shows a good prediction performance across all the datasets obtained for validation with the best results of a mean Average precision score of 0.91. Additionally, The models trained demonstrated robust performance on real event camera data under varying light conditions.
Abstract:This study introduces a novel approach to enhance the spatial-temporal resolution of time-event pixels based on luminance changes captured by event cameras. These cameras present unique challenges due to their low resolution and the sparse, asynchronous nature of the data they collect. Current event super-resolution algorithms are not fully optimized for the distinct data structure produced by event cameras, resulting in inefficiencies in capturing the full dynamism and detail of visual scenes with improved computational complexity. To bridge this gap, our research proposes a method that integrates binary spikes with Sigma Delta Neural Networks (SDNNs), leveraging spatiotemporal constraint learning mechanism designed to simultaneously learn the spatial and temporal distributions of the event stream. The proposed network is evaluated using widely recognized benchmark datasets, including N-MNIST, CIFAR10-DVS, ASL-DVS, and Event-NFS. A comprehensive evaluation framework is employed, assessing both the accuracy, through root mean square error (RMSE), and the computational efficiency of our model. The findings demonstrate significant improvements over existing state-of-the-art methods, specifically, the proposed method outperforms state-of-the-art performance in computational efficiency, achieving a 17.04-fold improvement in event sparsity and a 32.28-fold increase in synaptic operation efficiency over traditional artificial neural networks, alongside a two-fold better performance over spiking neural networks.
Abstract:Event camera-based driver monitoring is emerging as a pivotal area of research, driven by its significant advantages such as rapid response, low latency, power efficiency, enhanced privacy, and prevention of undersampling. Effective detection of driver distraction is crucial in driver monitoring systems to enhance road safety and reduce accident rates. The integration of an optimized sensor such as Event Camera with an optimized network is essential for maximizing these benefits. This paper introduces the innovative concept of sensing without seeing to detect driver distraction, leveraging computationally efficient spiking neural networks (SNN). To the best of our knowledge, this study is the first to utilize event camera data with spiking neural networks for driver distraction. The proposed Spiking-DD network not only achieve state of the art performance but also exhibit fewer parameters and provides greater accuracy than current event-based methodologies.
Abstract:Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700{\deg}/s in humans, especially during larger saccades that cover a visual angle of 25{\deg}. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.
Abstract:Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flow of blood in the wrist region. We investigate whether an event camera could be used for continuous noninvasive monitoring of heart rate (HR). Event camera video data from 25 participants, comprising varying age groups and skin colours, was collected and analysed. Ground-truth HR measurements obtained using conventional methods were used to evaluate of the accuracy of automatic detection of HR from event camera data. Our experimental results and comparison to the performance of other non-contact HR measurement methods demonstrate the feasibility of using event cameras for pulse detection. We also acknowledge the challenges and limitations of our method, such as light-induced flickering and the sub-conscious but naturally-occurring tremors of an individual during data capture.
Abstract:Neuromorphic vision sensors, or event cameras, differ from conventional cameras in that they do not capture images at a specified rate. Instead, they asynchronously log local brightness changes at each pixel. As a result, event cameras only record changes in a given scene, and do so with very high temporal resolution, high dynamic range, and low power requirements. Recent research has demonstrated how these characteristics make event cameras extremely practical sensors in driver monitoring systems (DMS), enabling the tracking of high-speed eye motion and blinks. This research provides a proof of concept to expand event-based DMS techniques to include seatbelt state detection. Using an event simulator, a dataset of 108,691 synthetic neuromorphic frames of car occupants was generated from a near-infrared (NIR) dataset, and split into training, validation, and test sets for a seatbelt state detection algorithm based on a recurrent convolutional neural network (CNN). In addition, a smaller set of real event data was collected and reserved for testing. In a binary classification task, the fastened/unfastened frames were identified with an F1 score of 0.989 and 0.944 on the simulated and real test sets respectively. When the problem extended to also classify the action of fastening/unfastening the seatbelt, respective F1 scores of 0.964 and 0.846 were achieved.
Abstract:Optical sensors have played a pivotal role in acquiring real world data for critical applications. This data, when integrated with advanced machine learning algorithms provides meaningful information thus enhancing human vision. This paper focuses on various optical technologies for design and development of state-of-the-art out-cabin forward vision systems and in-cabin driver monitoring systems. The focused optical sensors include Longwave Thermal Imaging (LWIR) cameras, Near Infrared (NIR), Neuromorphic/ event cameras, Visible CMOS cameras and Depth cameras. Further the paper discusses different potential applications which can be employed using the unique strengths of each these optical modalities in real time environment.
Abstract:In this research work, we have proposed a thermal tiny-YOLO multi-class object detection (TTYMOD) system as a smart forward sensing system that should remain effective in all weather and harsh environmental conditions using an end-to-end YOLO deep learning framework. It provides enhanced safety and improved awareness features for driver assistance. The system is trained on large-scale thermal public datasets as well as newly gathered novel open-sourced dataset comprising of more than 35,000 distinct thermal frames. For optimal training and convergence of YOLO-v5 tiny network variant on thermal data, we have employed different optimizers which include stochastic decent gradient (SGD), Adam, and its variant AdamW which has an improved implementation of weight decay. The performance of thermally tuned tiny architecture is further evaluated on the public as well as locally gathered test data in diversified and challenging weather and environmental conditions. The efficacy of a thermally tuned nano network is quantified using various qualitative metrics which include mean average precision, frames per second rate, and average inference time. Experimental outcomes show that the network achieved the best mAP of 56.4% with an average inference time/ frame of 4 milliseconds. The study further incorporates optimization of tiny network variant using the TensorFlow Lite quantization tool this is beneficial for the deployment of deep learning architectures on the edge and mobile devices. For this study, we have used a raspberry pi 4 computing board for evaluating the real-time feasibility performance of an optimized version of the thermal object detection network for the automotive sensor suite. The source code, trained and optimized models and complete validation/ testing results are publicly available at https://github.com/MAli-Farooq/Thermal-YOLO-And-Model-Optimization-Using-TensorFlowLite.
Abstract:Neuromorphic vision or event vision is an advanced vision technology, where in contrast to the visible camera that outputs pixels, the event vision generates neuromorphic events every time there is a brightness change which exceeds a specific threshold in the field of view (FOV). This study focuses on leveraging neuromorphic event data for roadside object detection. This is a proof of concept towards building artificial intelligence (AI) based pipelines which can be used for forward perception systems for advanced vehicular applications. The focus is on building efficient state-of-the-art object detection networks with better inference results for fast-moving forward perception using an event camera. In this article, the event-simulated A2D2 dataset is manually annotated and trained on two different YOLOv5 networks (small and large variants). To further assess its robustness, single model testing and ensemble model testing are carried out.