Abstract:Trajectory prediction is crucial to advance autonomous driving, improving safety, and efficiency. Although end-to-end models based on deep learning have great potential, they often do not consider vehicle dynamic limitations, leading to unrealistic predictions. To address this problem, this work introduces a novel hybrid model that combines deep learning with a kinematic motion model. It is able to predict object attributes such as acceleration and yaw rate and generate trajectories based on them. A key contribution is the incorporation of expert knowledge into the learning objective of the deep learning model. This results in the constraint of the available action space, thus enabling the prediction of physically feasible object attributes and trajectories, thereby increasing safety and robustness. The proposed hybrid model facilitates enhanced interpretability, thereby reinforcing the trustworthiness of deep learning methods and promoting the development of safe planning solutions. Experiments conducted on the publicly available real-world Argoverse dataset demonstrate realistic driving behaviour, with benchmark comparisons and ablation studies showing promising results.
Abstract:This work introduces the conditioned Vehicle Motion Diffusion (cVMD) model, a novel network architecture for highway trajectory prediction using diffusion models. The proposed model ensures the drivability of the predicted trajectory by integrating non-holonomic motion constraints and physical constraints into the generative prediction module. Central to the architecture of cVMD is its capacity to perform uncertainty quantification, a feature that is crucial in safety-critical applications. By integrating the quantified uncertainty into the prediction process, the cVMD's trajectory prediction performance is improved considerably. The model's performance was evaluated using the publicly available highD dataset. Experiments show that the proposed architecture achieves competitive trajectory prediction accuracy compared to state-of-the-art models, while providing guaranteed drivable trajectories and uncertainty quantification.
Abstract:For automotive applications, the Graph Attention Network (GAT) is a prominently used architecture to include relational information of a traffic scenario during feature embedding. As shown in this work, however, one of the most popular GAT realizations, namely GATv2, has potential pitfalls that hinder an optimal parameter learning. Especially for small and sparse graph structures a proper optimization is problematic. To surpass limitations, this work proposes architectural modifications of GATv2. In controlled experiments, it is shown that the proposed model adaptions improve prediction performance in a node-level regression task and make it more robust to parameter initialization. This work aims for a better understanding of the attention mechanism and analyzes its interpretability of identifying causal importance.
Abstract:This work introduces the multidimensional Graph Fourier Transformation Neural Network (GFTNN) for long-term trajectory predictions on highways. Similar to Graph Neural Networks (GNNs), the GFTNN is a novel network architecture that operates on graph structures. While several GNNs lack discriminative power due to suboptimal aggregation schemes, the proposed model aggregates scenario properties through a powerful operation: the multidimensional Graph Fourier Transformation (GFT). The spatio-temporal vehicle interaction graph of a scenario is converted into a spectral scenario representation using the GFT. This beneficial representation is input to the prediction framework composed of a neural network and a descriptive decoder. Even though the proposed GFTNN does not include any recurrent element, it outperforms state-of-the-art models in the task of highway trajectory prediction. For experiments and evaluation, the publicly available datasets highD and NGSIM are used
Abstract:This work provides a comprehensive derivation of the parameter gradients for GATv2 [4], a widely used implementation of Graph Attention Networks (GATs). GATs have proven to be powerful frameworks for processing graph-structured data and, hence, have been used in a range of applications. However, the achieved performance by these attempts has been found to be inconsistent across different datasets and the reasons for this remains an open research question. As the gradient flow provides valuable insights into the training dynamics of statistically learning models, this work obtains the gradients for the trainable model parameters of GATv2. The gradient derivations supplement the efforts of [2], where potential pitfalls of GATv2 are investigated.
Abstract:Representation learning in recent years has been addressed with self-supervised learning methods. The input data is augmented into two distorted views and an encoder learns the representations that are invariant to distortions -- cross-view prediction. Augmentation is one of the key components in cross-view self-supervised learning frameworks to learn visual representations. This paper presents ExAgt, a novel method to include expert knowledge for augmenting traffic scenarios, to improve the learnt representations without any human annotation. The expert-guided augmentations are generated in an automated fashion based on the infrastructure, the interactions between the EGO and the traffic participants and an ideal sensor model. The ExAgt method is applied in two state-of-the-art cross-view prediction methods and the representations learnt are tested in downstream tasks like classification and clustering. Results show that the ExAgt method improves representation learning compared to using only standard augmentations and it provides a better representation space stability. The code is available at https://github.com/lab176344/ExAgt.
Abstract:Clustering traffic scenarios and detecting novel scenario types are required for scenario-based testing of autonomous vehicles. These tasks benefit from either good similarity measures or good representations for the traffic scenarios. In this work, an expert-knowledge aided representation learning for traffic scenarios is presented. The latent space so formed is used for successful clustering and novel scenario type detection. Expert-knowledge is used to define objectives that the latent representations of traffic scenarios shall fulfill. It is presented, how the network architecture and loss is designed from these objectives, thereby incorporating expert-knowledge. An automatic mining strategy for traffic scenarios is presented, such that no manual labeling is required. Results show the performance advantage compared to baseline methods. Additionally, extensive analysis of the latent space is performed.
Abstract:Traffic scenario categorisation is an essential component of automated driving, for e.\,g., in motion planning algorithms and their validation. Finding new relevant scenarios without handcrafted steps reduce the required resources for the development of autonomous driving dramatically. In this work, a method is proposed to address this challenge by introducing a clustering technique based on a novel data-adaptive similarity measure, called Random Forest Activation Pattern (RFAP) similarity. The RFAP similarity is generated using a tree encoding scheme in a Random Forest algorithm. The clustering method proposed in this work takes into account that there are labelled scenarios available and the information from the labelled scenarios can help to guide the clustering of unlabelled scenarios. It consists of three steps. First, a self-supervised Convolutional Neural Network~(CNN) is trained on all available traffic scenarios using a defined self-supervised objective. Second, the CNN is fine-tuned for classification of the labelled scenarios. Third, using the labelled and unlabelled scenarios an iterative optimisation procedure is performed for clustering. In the third step at each epoch of the iterative optimisation, the CNN is used as a feature generator for an unsupervised Random Forest. The trained forest, in turn, provides the RFAP similarity to adapt iteratively the feature generation process implemented by the CNN. Extensive experiments and ablation studies have been done on the highD dataset. The proposed method shows superior performance compared to baseline clustering techniques.
Abstract:An understanding and classification of driving scenarios are important for testing and development of autonomous driving functionalities. Machine learning models are useful for scenario classification but most of them assume that data received during the testing are from one of the classes used in the training. This assumption is not true always because of the open environment where vehicles operate. This is addressed by a new machine learning paradigm called open-set recognition. Open-set recognition is the problem of assigning test samples to one of the classes used in training or to an unknown class. This work proposes a combination of Convolutional Neural Networks (CNN) and Random Forest (RF) for open set recognition of traffic scenarios. CNNs are used for the feature generation and the RF algorithm along with extreme value theory for the detection of known and unknown classes. The proposed solution is featured by exploring the vote patterns of trees in RF instead of just majority voting. By inheriting the ensemble nature of RF, the vote pattern of all trees combined with extreme value theory is shown to be well suited for detecting unknown classes. The proposed method has been tested on the highD and OpenTraffic datasets and has demonstrated superior performance in various aspects compared to existing solutions.
Abstract:Detecting unknown and untested scenarios is crucial for scenario-based testing. Scenario-based testing is considered to be a possible approach to validate autonomous vehicles. A traffic scenario consists of multiple components, with infrastructure being one of it. In this work, a method to detect novel traffic scenarios based on their infrastructure images is presented. An autoencoder triplet network provides latent representations for infrastructure images which are used for outlier detection. The triplet training of the network is based on the connectivity graphs of the infrastructure. By using the proposed architecture, expert-knowledge is used to shape the latent space such that it incorporates a pre-defined similarity in the neighborhood relationships of an autoencoder. An ablation study on the architecture is highlighting the importance of the triplet autoencoder combination. The best performing architecture is based on vision transformers, a convolution-free attention-based network. The presented method outperforms other state-of-the-art outlier detection approaches.