Abstract:This article presents a novel approach to incorporate visual cues from video-data from a wide-angle stereo camera system mounted at an urban intersection into the forecast of cyclist trajectories. We extract features from image and optical flow (OF) sequences using 3D convolutional neural networks (3D-ConvNet) and combine them with features extracted from the cyclist's past trajectory to forecast future cyclist positions. By the use of additional information, we are able to improve positional accuracy by about 7.5 % for our test dataset and by up to 22 % for specific motion types compared to a method solely based on past trajectories. Furthermore, we compare the use of image sequences to the use of OF sequences as additional information, showing that OF alone leads to significant improvements in positional accuracy. By training and testing our methods using a real-world dataset recorded at a heavily frequented public intersection and evaluating the methods' runtimes, we demonstrate the applicability in real traffic scenarios. Our code and parts of our dataset are made publicly available.
Abstract:In this article, an approach for probabilistic trajectory forecasting of vulnerable road users (VRUs) is presented, which considers past movements and the surrounding scene. Past movements are represented by 3D poses reflecting the posture and movements of individual body parts. The surrounding scene is modeled in the form of semantic maps showing, e.g., the course of streets, sidewalks, and the occurrence of obstacles. The forecasts are generated in grids discretizing the space and in the form of arbitrary discrete probability distributions. The distributions are evaluated in terms of their reliability, sharpness, and positional accuracy. We compare our method with an approach that provides forecasts in the form of Gaussian distributions and discuss the respective advantages and disadvantages. Thereby, we investigate the impact of using poses and semantic maps. With a technique called spatial label smoothing, our approach achieves reliable forecasts. Overall, the poses have a positive impact on the forecasts. The semantic maps offer the opportunity to adapt the probability distributions to the individual situation, although at the considered forecasted time horizon of 2.52 s they play a minor role compared to the past movements of the VRU. Our method is evaluated on a dataset recorded in inner-city traffic using a research vehicle. The dataset is made publicly available.
Abstract:This article presents a holistic approach for probabilistic cyclist intention detection. A basic movement detection based on motion history images (MHI) and a residual convolutional neural network (ResNet) are used to estimate probabilities for the current cyclist motion state. These probabilities are used as weights in a probabilistic ensemble trajectory forecast. The ensemble consists of specialized models, which produce individual forecasts in the form of Gaussian distributions under the assumption of a certain motion state of the cyclist (e.g. cyclist is starting or turning left). By weighting the specialized models, we create forecasts in the from of Gaussian mixtures that define regions within which the cyclists will reside with a certain probability. To evaluate our method, we rate the reliability, sharpness, and positional accuracy of our forecasted distributions. We compare our method to a single model approach which produces forecasts in the form of Gaussian distributions and show that our method is able to produce more reliable and sharper outputs while retaining comparable positional accuracy. Both methods are evaluated using a dataset created at a public traffic intersection. Our code and the dataset are made publicly available.
Abstract:In this article, we present a novel approach to detect starting motions of cyclists in real world traffic scenarios based on Motion History Images (MHIs). The method uses a deep Convolutional Neural Network (CNN) with a residual network architecture (ResNet), which is commonly used in image classification and detection tasks. By combining MHIs with a ResNet classifier and performing a frame by frame classification of the MHIs, we are able to detect starting motions in image sequences. The detection is performed using a wide angle stereo camera system at an urban intersection. We compare our algorithm to an existing method to detect movement transitions of pedestrians that uses MHIs in combination with a Histograms of Oriented Gradients (HOG) like descriptor and a Support Vector Machine (SVM), which we adapted to cyclists. To train and evaluate the methods a dataset containing MHIs of 394 cyclist starting motions was created. The results show that both methods can be used to detect starting motions of cyclists. Using the SVM approach, we were able to safely detect starting motions 0.506 s on average after the bicycle starts moving with an F1-score of 97.7%. The ResNet approach achieved an F1-score of 100% at an average detection time of 0.144 s. The ResNet approach outperformed the SVM approach in both robustness against false positive detections and detection time.