Abstract:One crucial information for a pedestrian crowd simulation is the number of agents moving from an origin to a certain target. While this setup has a large impact on the simulation, it is in most setups challenging to find the number of agents that should be spawned at a source in the simulation. Often, number are chosen based on surveys and experience of modelers and event organizers. These approaches are important and useful but reach their limits when we want to perform real-time predictions. In this case, a static information about the inflow is not sufficient. Instead, we need a dynamic information that can be retrieved each time the prediction is started. Nowadays, sensor data such as video footage or GPS tracks of a crowd are often available. If we can estimate the number of pedestrians who stem from a certain origin from this sensor data, we can dynamically initialize the simulation. In this study, we use density heatmaps that can be derived from sensor data as input for a random forest regressor to predict the origin distributions. We study three different datasets: A simulated dataset, experimental data, and a hybrid approach with both experimental and simulated data. In the hybrid setup, the model is trained with simulated data and then tested on experimental data. The results demonstrate that the random forest model is able to predict the origin distribution based on a single density heatmap for all three configurations. This is especially promising for applying the approach on real data since there is often only a limited amount of data available.