Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Jun 26, 2023

Chao Zhang, Shiwei Wu, Sirui Zhao, Tong Xu, Enhong Chen

Figure 1 for A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Figure 2 for A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Figure 3 for A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Figure 4 for A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Share this with someone who'll enjoy it:

Abstract:Affordance-centric Question-driven Task Completion (AQTC) for Egocentric Assistant introduces a groundbreaking scenario. In this scenario, through learning instructional videos, AI assistants provide users with step-by-step guidance on operating devices. In this paper, we present a solution for enhancing video alignment to improve multi-step inference. Specifically, we first utilize VideoCLIP to generate video-script alignment features. Afterwards, we ground the question-relevant content in instructional videos. Then, we reweight the multimodal context to emphasize prominent features. Finally, we adopt GRU to conduct multi-step inference. Through comprehensive experiments, we demonstrate the effectiveness and superiority of our method, which secured the 2nd place in CVPR'2023 AQTC challenge. Our code is available at https://github.com/zcfinal/LOVEU-CVPR23-AQTC.

* 5 pages, 1 figure, technical report for track3 of CVPR 2023 LOVEU challenge

View paper on

Share this with someone who'll enjoy it:

Title:A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Paper and Code