Abstract:Fault diagnosis is crucial for complex autonomous mobile systems, especially for modern-day autonomous driving (AD). Different actors, numerous use cases, and complex heterogeneous components motivate a fault diagnosis of the system and overall system integrity. AD systems are composed of many heterogeneous components, each with different functionality and possibly using a different algorithm (e.g., rule-based vs. AI components). In addition, these components are subject to the vehicle's driving state and are highly dependent. This paper, therefore, faces this problem by presenting the concept of a modular fault diagnosis framework for AD systems. The concept suggests modular state monitoring and diagnosis elements, together with a state- and dependency-aware aggregation method. Our proposed classification scheme allows for the categorization of the fault diagnosis modules. The concept is implemented on AD shuttle buses and evaluated to demonstrate its capabilities.
Abstract:As cities strive to address urban mobility challenges, combining autonomous transportation technologies with intelligent infrastructure presents an opportunity to transform how people move within urban environments. Autonomous shuttles are particularly suited for adaptive and responsive public transport for the first and last mile, connecting with smart infrastructure to enhance urban transit. This paper presents the concept, implementation, and evaluation of a proof-of-concept deployment of an autonomous shuttle integrated with smart infrastructure at a public fair. The infrastructure includes two perception-equipped bus stops and a connected pedestrian intersection, all linked through a central communication and control hub. Our key contributions include the development of a comprehensive system architecture for "smart" bus stops, the integration of multiple urban locations into a cohesive smart transport ecosystem, and the creation of adaptive shuttle behavior for automated driving. Additionally, we publish an open source dataset and a Vehicle-to-X (V2X) driver to support further research. Finally, we offer an outlook on future research directions and potential expansions of the demonstrated technologies and concepts.
Abstract:Recently, LiDAR perception methods for autonomous vehicles, powered by deep neural networks have experienced steep growth in performance on classic benchmarks, such as nuScenes and SemanticKITTI. However, there are still large gaps in performance when deploying models trained on such single-sensor setups to modern multi-sensor vehicles. In this work, we investigate if a lack of invariance may be responsible for these performance gaps, and propose some initial solutions in the form of application-specific data augmentations, which can facilitate better transfer to multi-sensor LiDAR setups. We provide experimental evidence that our proposed augmentations improve generalization across LiDAR sensor setups, and investigate how these augmentations affect the models' invariance properties on simulations of different LiDAR sensor setups.
Abstract:Effective traffic light detection is a critical component of the perception stack in autonomous vehicles. This work introduces a novel deep-learning detection system while addressing the challenges of previous work. Utilizing a comprehensive dataset amalgamation, including the Bosch Small Traffic Lights Dataset, LISA, the DriveU Traffic Light Dataset, and a proprietary dataset from Karlsruhe, we ensure a robust evaluation across varied scenarios. Furthermore, we propose a relevance estimation system that innovatively uses directional arrow markings on the road, eliminating the need for prior map creation. On the DriveU dataset, this approach results in 96% accuracy in relevance estimation. Finally, a real-world evaluation is performed to evaluate the deployment and generalizing abilities of these models. For reproducibility and to facilitate further research, we provide the model weights and code: https://github.com/KASTEL-MobilityLab/traffic-light-detection.
Abstract:Representing diverse and plausible future trajectories of actors is crucial for motion forecasting in autonomous driving. However, efficiently capturing the true trajectory distribution with a compact set is challenging. In this work, we propose a novel approach for generating scene-specific trajectory sets that better represent the diversity and admissibility of future actor behavior. Our method constructs multiple trajectory sets tailored to different scene contexts, such as intersections and non-intersections, by leveraging map information and actor dynamics. We introduce a deterministic goal sampling algorithm that identifies relevant map regions and generates trajectories conditioned on the scene layout. Furthermore, we empirically investigate various sampling strategies and set sizes to optimize the trade-off between coverage and diversity. Experiments on the Argoverse 2 dataset demonstrate that our scene-specific sets achieve higher plausibility while maintaining diversity compared to traditional single-set approaches. The proposed Recursive In-Distribution Subsampling (RIDS) method effectively condenses the representation space and outperforms metric-driven sampling in terms of trajectory admissibility. Our work highlights the benefits of scene-aware trajectory set generation for capturing the complex and heterogeneous nature of actor behavior in real-world driving scenarios.
Abstract:In real-world autonomous driving, deep learning models can experience performance degradation due to distributional shifts between the training data and the driving conditions encountered. As is typical in machine learning, it is difficult to acquire a large and potentially representative labeled test set to validate models in preparation for deployment in the wild. In this work, we introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner and detect and classify model discrepancies subsequently. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds, for an extensive quantitative analysis.
Abstract:In autonomous driving, the most challenging scenarios are the ones that can only be detected within their temporal context. Most video anomaly detection approaches focus either on surveillance or traffic accidents, which are only a subfield of autonomous driving. In this work, we present HF$^2$-VAD$_{AD}$, a variation of the HF$^2$-VAD surveillance video anomaly detection method for autonomous driving. We learn a representation of normality from a vehicle's ego perspective and evaluate pixel-wise anomaly detections in rare and critical scenarios.
Abstract:Dealing with atypical traffic scenarios remains a challenging task in autonomous driving. However, most anomaly detection approaches cannot be trained on raw sensor data but require exposure to outlier data and powerful semantic segmentation models trained in a supervised fashion. This limits the representation of normality to labeled data, which does not scale well. In this work, we revisit unsupervised anomaly detection and present UMAD, leveraging generative world models and unsupervised image segmentation. Our method outperforms state-of-the-art unsupervised anomaly detection.
Abstract:The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.
Abstract:Model compression and hardware acceleration are essential for the resource-efficient deployment of deep neural networks. Modern object detectors have highly interconnected convolutional layers with concatenations. In this work, we study how pruning can be applied to such architectures, exemplary for YOLOv7. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. By automating iterative sensitivity analysis, pruning, and subsequent model fine-tuning, we can significantly reduce model size both in terms of the number of parameters and FLOPs, while keeping comparable model accuracy. Finally, we deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX. Pruned models demonstrate a 2x speedup for the convolutional layers in comparison to the unpruned counterparts and reach real-time capability with 14 FPS on FPGA. Our code is available at https://github.com/fzi-forschungszentrum-informatik/iterative-yolo-pruning.