Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bharatesh Chakravarthi

How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection

Jun 16, 2025

Kaiyuan Tan, Pavan Kumar B N, Bharatesh Chakravarthi

Abstract:Event cameras are gaining traction in traffic monitoring applications due to their low latency, high temporal resolution, and energy efficiency, which makes them well-suited for real-time object detection at traffic intersections. However, the development of robust event-based detection models is hindered by the limited availability of annotated real-world datasets. To address this, several simulation tools have been developed to generate synthetic event data. Among these, the CARLA driving simulator includes a built-in dynamic vision sensor (DVS) module that emulates event camera output. Despite its potential, the sim-to-real gap for event-based object detection remains insufficiently studied. In this work, we present a systematic evaluation of this gap by training a recurrent vision transformer model exclusively on synthetic data generated using CARLAs DVS and testing it on varying combinations of synthetic and real-world event streams. Our experiments show that models trained solely on synthetic data perform well on synthetic-heavy test sets but suffer significant performance degradation as the proportion of real-world data increases. In contrast, models trained on real-world data demonstrate stronger generalization across domains. This study offers the first quantifiable analysis of the sim-to-real gap in event-based object detection using CARLAs DVS. Our findings highlight limitations in current DVS simulation fidelity and underscore the need for improved domain adaptation techniques in neuromorphic vision for traffic monitoring.

Via

Access Paper or Ask Questions

Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space

Apr 21, 2025

Kaustav Chanda, Aayush Atul Verma, Arpitsinh Vaghela, Yezhou Yang, Bharatesh Chakravarthi

Abstract:Event cameras promise a paradigm shift in vision sensing with their low latency, high dynamic range, and asynchronous nature of events. Unfortunately, the scarcity of high-quality labeled datasets hinders their widespread adoption in deep learning-driven computer vision. To mitigate this, several simulators have been proposed to generate synthetic event data for training models for detection and estimation tasks. However, the fundamentally different sensor design of event cameras compared to traditional frame-based cameras poses a challenge for accurate simulation. As a result, most simulated data fail to mimic data captured by real event cameras. Inspired by existing work on using deep features for image comparison, we introduce event quality score (EQS), a quality metric that utilizes activations of the RVT architecture. Through sim-to-real experiments on the DSEC driving dataset, it is shown that a higher EQS implies improved generalization to real-world data after training on simulated events. Thus, optimizing for EQS can lead to developing more realistic event camera simulators, effectively reducing the simulation gap. EQS is available at https://github.com/eventbasedvision/EQS.

* Accepted at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Fifth International Workshop on Event-Based Vision

Via

Access Paper or Ask Questions

Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Dec 02, 2024

Joseph Raj Vishal, Divesh Basina, Aarya Choudhary, Bharatesh Chakravarthi

Figure 1 for Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Figure 2 for Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Figure 3 for Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Figure 4 for Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Abstract:Recent advances in video question answering (VideoQA) offer promising applications, especially in traffic monitoring, where efficient video interpretation is critical. Within ITS, answering complex, real-time queries like "How many red cars passed in the last 10 minutes?" or "Was there an incident between 3:00 PM and 3:05 PM?" enhances situational awareness and decision-making. Despite progress in vision-language models, VideoQA remains challenging, especially in dynamic environments involving multiple objects and intricate spatiotemporal relationships. This study evaluates state-of-the-art VideoQA models using non-benchmark synthetic and real-world traffic sequences. The framework leverages GPT-4o to assess accuracy, relevance, and consistency across basic detection, temporal reasoning, and decomposition queries. VideoLLaMA-2 excelled with 57% accuracy, particularly in compositional reasoning and consistent answers. However, all models, including VideoLLaMA-2, faced limitations in multi-object tracking, temporal coherence, and complex scene interpretation, highlighting gaps in current architectures. These findings underscore VideoQA's potential in traffic monitoring but also emphasize the need for improvements in multi-object tracking, temporal reasoning, and compositional capabilities. Enhancing these areas could make VideoQA indispensable for incident detection, traffic flow management, and responsive urban planning. The study's code and framework are open-sourced for further exploration: https://github.com/joe-rabbit/VideoQA_Pilot_Study

Via

Access Paper or Ask Questions

KAT to KANs: A Review of Kolmogorov-Arnold Networks and the Neural Leap Forward

Nov 15, 2024

Divesh Basina, Joseph Raj Vishal, Aarya Choudhary, Bharatesh Chakravarthi

Abstract:The curse of dimensionality poses a significant challenge to modern multilayer perceptron-based architectures, often causing performance stagnation and scalability issues. Addressing this limitation typically requires vast amounts of data. In contrast, Kolmogorov-Arnold Networks have gained attention in the machine learning community for their bold claim of being unaffected by the curse of dimensionality. This paper explores the Kolmogorov-Arnold representation theorem and the mathematical principles underlying Kolmogorov-Arnold Networks, which enable their scalability and high performance in high-dimensional spaces. We begin with an introduction to foundational concepts necessary to understand Kolmogorov-Arnold Networks, including interpolation methods and Basis-splines, which form their mathematical backbone. This is followed by an overview of perceptron architectures and the Universal approximation theorem, a key principle guiding modern machine learning. This is followed by an overview of the Kolmogorov-Arnold representation theorem, including its mathematical formulation and implications for overcoming dimensionality challenges. Next, we review the architecture and error-scaling properties of Kolmogorov-Arnold Networks, demonstrating how these networks achieve true freedom from the curse of dimensionality. Finally, we discuss the practical viability of Kolmogorov-Arnold Networks, highlighting scenarios where their unique capabilities position them to excel in real-world applications. This review aims to offer insights into Kolmogorov-Arnold Networks' potential to redefine scalability and performance in high-dimensional learning tasks.

Via

Access Paper or Ask Questions

Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

Sep 01, 2024

Manthan Chelenahalli Satish, Duo Lu, Bharatesh Chakravarthi, Mohammad Farhadi, Yezhou Yang

Figure 1 for Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

Figure 2 for Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

Figure 3 for Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

Figure 4 for Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

Abstract:Traffic roundabouts, as complex and critical road scenarios, pose significant safety challenges for autonomous vehicles. In particular, the encounter of a vehicle with a dilemma zone (DZ) at a roundabout intersection is a pivotal concern. This paper presents an automated system that leverages trajectory forecasting to predict DZ events, specifically at traffic roundabouts. Our system aims to enhance safety standards in both autonomous and manual transportation. The core of our approach is a modular, graph-structured recurrent model that forecasts the trajectories of diverse agents, taking into account agent dynamics and integrating heterogeneous data, such as semantic maps. This model, based on graph neural networks, aids in predicting DZ events and enhances traffic management decision-making. We evaluated our system using a real-world dataset of traffic roundabout intersections. Our experimental results demonstrate that our dilemma forecasting system achieves a high precision with a low false positive rate of 0.1. This research represents an advancement in roundabout DZ data mining and forecasting, contributing to the assurance of intersection safety in the era of autonomous vehicles.

Via

Access Paper or Ask Questions

Recent Event Camera Innovations: A Survey

Aug 27, 2024

Bharatesh Chakravarthi, Aayush Atul Verma, Kostas Daniilidis, Cornelia Fermuller, Yezhou Yang

Figure 1 for Recent Event Camera Innovations: A Survey

Figure 2 for Recent Event Camera Innovations: A Survey

Figure 3 for Recent Event Camera Innovations: A Survey

Figure 4 for Recent Event Camera Innovations: A Survey

Abstract:Event-based vision, inspired by the human visual system, offers transformative capabilities such as low latency, high dynamic range, and reduced power consumption. This paper presents a comprehensive survey of event cameras, tracing their evolution over time. It introduces the fundamental principles of event cameras, compares them with traditional frame cameras, and highlights their unique characteristics and operational differences. The survey covers various event camera models from leading manufacturers, key technological milestones, and influential research contributions. It explores diverse application areas across different domains and discusses essential real-world and synthetic datasets for research advancement. Additionally, the role of event camera simulators in testing and development is discussed. This survey aims to consolidate the current state of event cameras and inspire further innovation in this rapidly evolving field. To support the research community, a GitHub page (https://github.com/chakravarthi589/Event-based-Vision_Resources) categorizes past and future research articles and consolidates valuable resources.

Via

Access Paper or Ask Questions

SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

Aug 18, 2024

Tiejin Chen, Prithvi Shirke, Bharatesh Chakravarthi, Arpitsinh Vaghela, Longchao Da, Duo Lu, Yezhou Yang, Hua Wei

Figure 1 for SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

Figure 2 for SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

Figure 3 for SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

Figure 4 for SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

Abstract:This paper introduces SynTraC, the first public image-based traffic signal control dataset, aimed at bridging the gap between simulated environments and real-world traffic management challenges. Unlike traditional datasets for traffic signal control which aim to provide simplified feature vectors like vehicle counts from traffic simulators, SynTraC provides real-style images from the CARLA simulator with annotated features, along with traffic signal states. This image-based dataset comes with diverse real-world scenarios, including varying weather and times of day. Additionally, SynTraC also provides different reward values for advanced traffic signal control algorithms like reinforcement learning. Experiments with SynTraC demonstrate that it is still an open challenge to image-based traffic signal control methods compared with feature-based control methods, indicating our dataset can further guide the development of future algorithms. The code for this paper can be found in \url{https://github.com/DaRL-LibSignal/SynTraC}.SynTraC

* Accepted to IEEE ITSC2024

Via

Access Paper or Ask Questions

SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

Apr 12, 2024

Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang

Abstract:Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities. The dataset is available at https://eventbasedvision.github.io/SEVD

Via

Access Paper or Ask Questions

eTraM: Event-based Traffic Monitoring Dataset

Apr 02, 2024

Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

Figure 1 for eTraM: Event-based Traffic Monitoring Dataset

Abstract:Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields. However, their potential in static traffic monitoring remains largely unexplored. To facilitate this exploration, we present eTraM - a first-of-its-kind, fully event-based traffic monitoring dataset. eTraM offers 10 hr of data from different traffic scenarios in various lighting and weather conditions, providing a comprehensive overview of real-world situations. Providing 2M bounding box annotations, it covers eight distinct classes of traffic participants, ranging from vehicles to pedestrians and micro-mobility. eTraM's utility has been assessed using state-of-the-art methods for traffic participant detection, including RVT, RED, and YOLOv8. We quantitatively evaluate the ability of event-based models to generalize on nighttime and unseen scenes. Our findings substantiate the compelling potential of leveraging event cameras for traffic monitoring, opening new avenues for research and application. eTraM is available at https://eventbasedvision.github.io/eTraM

Via

Access Paper or Ask Questions

SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras

Sep 04, 2023

Himanshu Pahadia, Duo Lu, Bharatesh Chakravarthi, Yezhou Yang

Abstract:Intelligent transportation systems (ITS) have revolutionized modern road infrastructure, providing essential functionalities such as traffic monitoring, road safety assessment, congestion reduction, and law enforcement. Effective vehicle detection and accurate vehicle pose estimation are crucial for ITS, particularly using monocular cameras installed on the road infrastructure. One fundamental challenge in vision-based vehicle monitoring is keypoint detection, which involves identifying and localizing specific points on vehicles (such as headlights, wheels, taillights, etc.). However, this task is complicated by vehicle model and shape variations, occlusion, weather, and lighting conditions. Furthermore, existing traffic perception datasets for keypoint detection predominantly focus on frontal views from ego vehicle-mounted sensors, limiting their usability in traffic monitoring. To address these issues, we propose SKoPe3D, a unique synthetic vehicle keypoint dataset generated using the CARLA simulator from a roadside perspective. This comprehensive dataset includes generated images with bounding boxes, tracking IDs, and 33 keypoints for each vehicle. Spanning over 25k images across 28 scenes, SKoPe3D contains over 150k vehicle instances and 4.9 million keypoints. To demonstrate its utility, we trained a keypoint R-CNN model on our dataset as a baseline and conducted a thorough evaluation. Our experiments highlight the dataset's applicability and the potential for knowledge transfer between synthetic and real-world data. By leveraging the SKoPe3D dataset, researchers and practitioners can overcome the limitations of existing datasets, enabling advancements in vehicle keypoint detection for ITS.

* Accepted to IEEE ITSC 2023

Via

Access Paper or Ask Questions