Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianhang Chen

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Apr 30, 2024

Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan(+9 more)

Figure 1 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Figure 2 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Figure 3 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Figure 4 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Abstract:Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.

* Codebase: https://github.com/fesvhtr/CUVA

Via

Access Paper or Ask Questions

Multi-View Matching Network for 6D Pose Estimation

Nov 27, 2019

Daniel Mas Montserrat, Jianhang Chen, Qian Lin, Jan P. Allebach, Edward J. Delp

Figure 1 for Multi-View Matching Network for 6D Pose Estimation

Figure 2 for Multi-View Matching Network for 6D Pose Estimation

Figure 3 for Multi-View Matching Network for 6D Pose Estimation

Figure 4 for Multi-View Matching Network for 6D Pose Estimation

Abstract:Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refine and track the pose of the objects by matching the input image with rendered images.

Via

Access Paper or Ask Questions