Abstract:This article presents a novel approach to incorporate visual cues from video-data from a wide-angle stereo camera system mounted at an urban intersection into the forecast of cyclist trajectories. We extract features from image and optical flow (OF) sequences using 3D convolutional neural networks (3D-ConvNet) and combine them with features extracted from the cyclist's past trajectory to forecast future cyclist positions. By the use of additional information, we are able to improve positional accuracy by about 7.5 % for our test dataset and by up to 22 % for specific motion types compared to a method solely based on past trajectories. Furthermore, we compare the use of image sequences to the use of OF sequences as additional information, showing that OF alone leads to significant improvements in positional accuracy. By training and testing our methods using a real-world dataset recorded at a heavily frequented public intersection and evaluating the methods' runtimes, we demonstrate the applicability in real traffic scenarios. Our code and parts of our dataset are made publicly available.
Abstract:In this article, an approach for probabilistic trajectory forecasting of vulnerable road users (VRUs) is presented, which considers past movements and the surrounding scene. Past movements are represented by 3D poses reflecting the posture and movements of individual body parts. The surrounding scene is modeled in the form of semantic maps showing, e.g., the course of streets, sidewalks, and the occurrence of obstacles. The forecasts are generated in grids discretizing the space and in the form of arbitrary discrete probability distributions. The distributions are evaluated in terms of their reliability, sharpness, and positional accuracy. We compare our method with an approach that provides forecasts in the form of Gaussian distributions and discuss the respective advantages and disadvantages. Thereby, we investigate the impact of using poses and semantic maps. With a technique called spatial label smoothing, our approach achieves reliable forecasts. Overall, the poses have a positive impact on the forecasts. The semantic maps offer the opportunity to adapt the probability distributions to the individual situation, although at the considered forecasted time horizon of 2.52 s they play a minor role compared to the past movements of the VRU. Our method is evaluated on a dataset recorded in inner-city traffic using a research vehicle. The dataset is made publicly available.
Abstract:This article presents a holistic approach for probabilistic cyclist intention detection. A basic movement detection based on motion history images (MHI) and a residual convolutional neural network (ResNet) are used to estimate probabilities for the current cyclist motion state. These probabilities are used as weights in a probabilistic ensemble trajectory forecast. The ensemble consists of specialized models, which produce individual forecasts in the form of Gaussian distributions under the assumption of a certain motion state of the cyclist (e.g. cyclist is starting or turning left). By weighting the specialized models, we create forecasts in the from of Gaussian mixtures that define regions within which the cyclists will reside with a certain probability. To evaluate our method, we rate the reliability, sharpness, and positional accuracy of our forecasted distributions. We compare our method to a single model approach which produces forecasts in the form of Gaussian distributions and show that our method is able to produce more reliable and sharper outputs while retaining comparable positional accuracy. Both methods are evaluated using a dataset created at a public traffic intersection. Our code and the dataset are made publicly available.
Abstract:In future, vehicles and other traffic participants will be interconnected and equipped with various types of sensors, allowing for cooperation on different levels, such as situation prediction or intention detection. In this article we present a cooperative approach for starting movement detection of cyclists using a boosted stacking ensemble approach realizing feature- and decision level cooperation. We introduce a novel method based on a 3D Convolutional Neural Network (CNN) to detect starting motions on image sequences by learning spatio-temporal features. The CNN is complemented by a smart device based starting movement detection originating from smart devices carried by the cyclist. Both model outputs are combined in a stacking ensemble approach using an extreme gradient boosting classifier resulting in a fast and yet robust cooperative starting movement detector. We evaluate our cooperative approach on real-world data originating from experiments with 49 test subjects consisting of 84 starting motions.
Abstract:Vulnerable road users (VRUs, i.e. cyclists and pedestrians) will play an important role in future traffic. To avoid accidents and achieve a highly efficient traffic flow, it is important to detect VRUs and to predict their intentions. In this article a holistic approach for detecting intentions of VRUs by cooperative methods is presented. The intention detection consists of basic movement primitive prediction, e.g. standing, moving, turning, and a forecast of the future trajectory. Vehicles equipped with sensors, data processing systems and communication abilities, referred to as intelligent vehicles, acquire and maintain a local model of their surrounding traffic environment, e.g. crossing cyclists. Heterogeneous, open sets of agents (cooperating and interacting vehicles, infrastructure, e.g. cameras and laser scanners, and VRUs equipped with smart devices and body-worn sensors) exchange information forming a multi-modal sensor system with the goal to reliably and robustly detect VRUs and their intentions under consideration of real time requirements and uncertainties. The resulting model allows to extend the perceptual horizon of the individual agent beyond their own sensory capabilities, enabling a longer forecast horizon. Concealments, implausibilities and inconsistencies are resolved by the collective intelligence of cooperating agents. Novel techniques of signal processing and modelling in combination with analytical and learning based approaches of pattern and activity recognition are used for detection, as well as intention prediction of VRUs. Cooperation, by means of probabilistic sensor and knowledge fusion, takes place on the level of perception and intention recognition. Based on the requirements of the cooperative approach for the communication a new strategy for an ad hoc network is proposed.
Abstract:In future traffic scenarios, vehicles and other traffic participants will be interconnected and equipped with various types of sensors, allowing for cooperation based on data or information exchange. This article presents an approach to cooperative tracking of cyclists using smart devices and infrastructure-based sensors. A smart device is carried by the cyclists and an intersection is equipped with a wide angle stereo camera system. Two tracking models are presented and compared. The first model is based on the stereo camera system detections only, whereas the second model cooperatively combines the camera based detections with velocity and yaw rate data provided by the smart device. Our aim is to overcome limitations of tracking approaches based on single data sources. We show in numerical evaluations on scenes where cyclists are starting or turning right that the cooperation leads to an improvement in both the ability to keep track of a cyclist and the accuracy of the track particularly when it comes to occlusions in the visual system. We, therefore, contribute to the safety of vulnerable road users in future traffic.
Abstract:Avoiding collisions with vulnerable road users (VRUs) using sensor-based early recognition of critical situations is one of the manifold opportunities provided by the current development in the field of intelligent vehicles. As especially pedestrians and cyclists are very agile and have a variety of movement options, modeling their behavior in traffic scenes is a challenging task. In this article we propose movement models based on machine learning methods, in particular artificial neural networks, in order to classify the current motion state and to predict the future trajectory of VRUs. Both model types are also combined to enable the application of specifically trained motion predictors based on a continuously updated pseudo probabilistic state classification. Furthermore, the architecture is used to evaluate motion-specific physical models for starting and stopping and video-based pedestrian motion classification. A comprehensive dataset consisting of 1068 pedestrian and 494 cyclist scenes acquired at an urban intersection is used for optimization, training, and evaluation of the different models. The results show substantial higher classification rates and the ability to earlier recognize motion state changes with the machine learning approaches compared to interacting multiple model (IMM) Kalman Filtering. The trajectory prediction quality is also improved for all kinds of test scenes, especially when starting and stopping motions are included. Here, 37\% and 41\% lower position errors were achieved on average, respectively.
Abstract:Highly automated driving requires precise models of traffic participants. Many state of the art models are currently based on machine learning techniques. Among others, the required amount of labeled data is one major challenge. An autonomous learning process addressing this problem is proposed. The initial models are iteratively refined in three steps: (1) detection and context identification, (2) novelty detection and active learning and (3) online model adaption.
Abstract:In this article, we present a novel approach to detect starting motions of cyclists in real world traffic scenarios based on Motion History Images (MHIs). The method uses a deep Convolutional Neural Network (CNN) with a residual network architecture (ResNet), which is commonly used in image classification and detection tasks. By combining MHIs with a ResNet classifier and performing a frame by frame classification of the MHIs, we are able to detect starting motions in image sequences. The detection is performed using a wide angle stereo camera system at an urban intersection. We compare our algorithm to an existing method to detect movement transitions of pedestrians that uses MHIs in combination with a Histograms of Oriented Gradients (HOG) like descriptor and a Support Vector Machine (SVM), which we adapted to cyclists. To train and evaluate the methods a dataset containing MHIs of 394 cyclist starting motions was created. The results show that both methods can be used to detect starting motions of cyclists. Using the SVM approach, we were able to safely detect starting motions 0.506 s on average after the bicycle starts moving with an F1-score of 97.7%. The ResNet approach achieved an F1-score of 100% at an average detection time of 0.144 s. The ResNet approach outperformed the SVM approach in both robustness against false positive detections and detection time.