Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weiji Li

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

Sep 22, 2023

Oana Ignat, Santiago Castro, Weiji Li, Rada Mihalcea

Abstract:We introduce the task of automatic human action co-occurrence identification, i.e., determine whether two human actions can co-occur in the same interval of time. We create and make publicly available the ACE (Action Co-occurrencE) dataset, consisting of a large graph of ~12k co-occurring pairs of visual actions and their corresponding video clips. We describe graph link prediction models that leverage visual and textual information to automatically infer if two actions are co-occurring. We show that graphs are particularly well suited to capture relations between human actions, and the learned graph representations are effective for our task and capture novel and relevant information across different data domains. The ACE dataset and the code introduced in this paper are publicly available at https://github.com/MichiganNLP/vlog_action_co-occurrence.

Via

Access Paper or Ask Questions

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Sep 09, 2021

Oana Ignat, Santiago Castro, Hanwen Miao, Weiji Li, Rada Mihalcea

Figure 1 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 2 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 3 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 4 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Abstract:We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions