Crowding at the entrances of large events may lead to critical and life-threatening situations, particularly when people start pushing each other to reach the event faster. A system for automatic and timely identification of pushing behavior would help organizers and security forces to intervene early and mitigate dangerous situations. In this paper, we propose a cloud-based deep learning system for early detection of pushing automatically in the live video stream of crowded event entrances. The proposed system relies mainly on two models: a pre-trained deep optical flow and an adapted version of the EfficientNetV2B0 classifier. The optical flow model extracts the characteristics of the crowd motion in the live video stream, while the classifier analyses the crowd motion and annotates pushing patches in the live stream. A novel dataset is generated based on five real-world experiments and their associated ground truth data to train the adapted EfficientNetV2B0 model. The experimental situations simulated a crowded event entrance, and social psychologists manually created the ground truths for each video experiment. Several experiments on the videos and the generated dataset are carried out to evaluate the accuracy and annotation delay time of the proposed system. Furthermore, the experts manually revised the annotation results of the system. Findings indicate that the system identified pushing behaviors with an accuracy rate of 89% within an acceptable delay time.