Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Grounding Partially-Defined Events in Multimodal Data

Oct 07, 2024

Kate Sanders, Reno Kriz, David Etter, Hannah Recknor, Alexander Martin, Cameron Carpenter, Jingyang Lin, Benjamin Van Durme

Figure 1 for Grounding Partially-Defined Events in Multimodal Data

Figure 2 for Grounding Partially-Defined Events in Multimodal Data

Figure 3 for Grounding Partially-Defined Events in Multimodal Data

Figure 4 for Grounding Partially-Defined Events in Multimodal Data

Share this with someone who'll enjoy it:

Abstract:How are we able to learn about complex current events just from short snippets of video? While natural language enables straightforward ways to represent under-specified, partially observable events, visual data does not facilitate analogous methods and, consequently, introduces unique challenges in event understanding. With the growing prevalence of vision-capable AI agents, these systems must be able to model events from collections of unstructured video data. To tackle robust event modeling in multimodal settings, we introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task. We propose a corresponding benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities. We propose a collection of LLM-driven approaches to the task of multimodal event analysis, and evaluate them on MultiVENT-G. Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.

* Preprint; 9 pages; 2024 EMNLP Findings

View paper on

Share this with someone who'll enjoy it:

Title:Grounding Partially-Defined Events in Multimodal Data

Paper and Code