Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zecheng Yu

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Nov 30, 2023

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote(+91 more)

Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Abstract:We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). More than 800 participants from 13 cities worldwide performed these activities in 131 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,422 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources will be open sourced to fuel new research in the community.

Via

Access Paper or Ask Questions

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Feb 10, 2023

Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Figure 1 for Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Figure 2 for Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Figure 3 for Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Figure 4 for Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

Abstract:Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning. However, the definition of affordance in existing datasets often: 1) mix up affordance with object functionality; 2) confuse affordance with goal-related action; and 3) ignore human motor capacity. This paper proposes an efficient annotation scheme to address these issues by combining goal-irrelevant motor actions and grasp types as affordance labels and introducing the concept of mechanical action to represent the action possibilities between two objects. We provide new annotations by applying this scheme to the EPIC-KITCHENS dataset and test our annotation with tasks such as affordance recognition, hand-object interaction hotspots prediction, and cross-domain evaluation of affordance. The results show that models trained with our annotation can distinguish affordance from other concepts, predict fine-grained interaction possibilities on objects, and generalize through different domains.

* WACV 2023. Refined version of Workshop article arXiv:2206.05424

Via

Access Paper or Ask Questions

Precise Affordance Annotation for Egocentric Action Video Datasets

Jun 11, 2022

Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Figure 1 for Precise Affordance Annotation for Egocentric Action Video Datasets

Figure 2 for Precise Affordance Annotation for Egocentric Action Video Datasets

Figure 3 for Precise Affordance Annotation for Egocentric Action Video Datasets

Figure 4 for Precise Affordance Annotation for Egocentric Action Video Datasets

Abstract:Object affordance is an important concept in human-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning. However, existing datasets often: 1) mix up affordance with object functionality; 2) confuse affordance with goal-related action; and 3) ignore human motor capacity. This paper proposes an efficient annotation scheme to address these issues by combining goal-irrelevant motor actions and grasp types as affordance labels and introducing the concept of mechanical action to represent the action possibilities between two objects. We provide new annotations by applying this scheme to the EPIC-KITCHENS dataset and test our annotation with tasks such as affordance recognition. We qualitatively verify that models trained with our annotation can distinguish affordance and mechanical actions.

* Technical report for CVPR 2022 EPIC-Ego4D Workshop

Via

Access Paper or Ask Questions