Abstract:Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.