Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanwen Miao

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Sep 09, 2021

Oana Ignat, Santiago Castro, Hanwen Miao, Weiji Li, Rada Mihalcea

Figure 1 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 2 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 3 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Figure 4 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Abstract:We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions

Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Nov 30, 2020

Hanwen Miao, Shengan Zhang, Carol Flannagan

Figure 1 for Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Figure 2 for Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Figure 3 for Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Figure 4 for Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Abstract:Naturalistic driving data (NDD) is an important source of information to understand crash causation and human factors and to further develop crash avoidance countermeasures. Videos recorded while driving are often included in such datasets. While there is often a large amount of video data in NDD, only a small portion of them can be annotated by human coders and used for research, which underuses all video data. In this paper, we explored a computer vision method to automatically extract the information we need from videos. More specifically, we developed a 3D ConvNet algorithm to automatically extract cell-phone-related behaviors from videos. The experiments show that our method can extract chunks from videos, most of which (~79%) contain the automatically labeled cell phone behaviors. In conjunction with human review of the extracted chunks, this approach can find cell-phone-related driver behaviors much more efficiently than simply viewing video.

Via

Access Paper or Ask Questions