Abstract:Crowd monitoring and analysis in mass events are highly important technologies to support the security of attending persons. Proposed methods based on terrestrial or airborne image/video data often fail in achieving sufficiently accurate results to guarantee a robust service. We present a novel framework for estimating human count, density and motion from video data based on custom tailored object detection techniques, a regression based density estimate and a total variation based optical flow extraction. From the gathered features we present a detailed accuracy analysis versus ground truth measurements. In addition, all information is projected into world coordinates to enable a direct integration with existing geo-information systems. The resulting human counts demonstrate a mean error of 4% to 9% and thus represent a most efficient measure that can be robustly applied in security critical services.