Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zizheng Zhou

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Dec 27, 2024

Xiaoyang Liu, Boran Wen, Xinpeng Liu, Zizheng Zhou, Hongwei Fan, Cewu Lu, Lizhuang Ma, Yulong Chen, Yong-Lu Li

Figure 1 for Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Figure 2 for Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Figure 3 for Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Figure 4 for Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Abstract:Spatio-temporal Human-Object Interaction (ST-HOI) understanding aims at detecting HOIs from videos, which is crucial for activity understanding. However, existing whole-body-object interaction video benchmarks overlook the truth that open-world objects are diverse, that is, they usually provide limited and predefined object classes. Therefore, we introduce a new open-world benchmark: Grounding Interacted Objects (GIO) including 1,098 interacted objects class and 290K interacted object boxes annotation. Accordingly, an object grounding task is proposed expecting vision systems to discover interacted objects. Even though today's detectors and grounding methods have succeeded greatly, they perform unsatisfactorily in localizing diverse and rare objects in GIO. This profoundly reveals the limitations of current vision systems and poses a great challenge. Thus, we explore leveraging spatio-temporal cues to address object grounding and propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos. Our method demonstrates significant superiority in extensive experiments compared to current baselines. Data and code will be publicly available at https://github.com/DirtyHarryLYL/HAKE-AVA.

* To be published in the Proceedings of AAAI 2025. The first three authors contributed equally. Project: https://github.com/DirtyHarryLYL/HAKE-AVA

Via

Access Paper or Ask Questions

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Oct 11, 2023

Xinpeng Liu, Yong-Lu Li, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu

Figure 1 for Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Figure 2 for Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Figure 3 for Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Figure 4 for Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Abstract:The goal of motion understanding is to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem. An abstract action semantic (i.e., walk forwards) could be conveyed by perceptually diverse motions (walk with arms up or swinging), while a motion could carry different semantics w.r.t. its context and intention. This makes an elegant mapping between them difficult. Previous attempts adopted direct-mapping paradigms with limited reliability. Also, current automatic metrics fail to provide reliable assessments of the consistency between motions and action semantics. We identify the source of these problems as the significant gap between the two modalities. To alleviate this gap, we propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality characteristics. Based on KP as a mediator, we can unify a motion knowledge base and build a motion understanding system. Meanwhile, KP can be automatically converted from motions and to text descriptions with no subjective bias, inspiring Kinematic Prompt Generation (KPG) as a novel automatic motion generation benchmark. In extensive experiments, our approach shows superiority over other methods. Our code and data would be made publicly available at https://foruck.github.io/KP.

* Yong-Lu Li and Cewu Lu are the corresponding authors. Project page is available at https://foruck.github.io/KP/

Via

Access Paper or Ask Questions