Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Dec 10, 2024

Hang Du, Guoshun Nan, Jiawen Qian, Wangchenhui Wu, Wendi Deng, Hanqing Mu, Zhenyan Chen, Pengxuan Mao, Xiaofeng Tao, Jun Liu

Figure 1 for Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Figure 2 for Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Figure 3 for Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Figure 4 for Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Share this with someone who'll enjoy it:

Abstract:Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.

* Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: substantial text overlap with arXiv:2405.00181

View paper on

Share this with someone who'll enjoy it:

Title:Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Paper and Code