Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liqi Cheng

ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

May 23, 2025

Yuchen He, Jianbing Lv, Liqi Cheng, Lingyu Meng, Dazhen Deng, Yingcai Wu

Figure 1 for ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

Figure 2 for ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

Figure 3 for ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

Figure 4 for ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

Abstract:Temporal Action Localization (TAL) aims to detect the start and end timestamps of actions in a video. However, the training of TAL models requires a substantial amount of manually annotated data. Data programming is an efficient method to create training labels with a series of human-defined labeling functions. However, its application in TAL faces difficulties of defining complex actions in the context of temporal video frames. In this paper, we propose ProTAL, a drag-and-link video programming framework for TAL. ProTAL enables users to define \textbf{key events} by dragging nodes representing body parts and objects and linking them to constrain the relations (direction, distance, etc.). These definitions are used to generate action labels for large-scale unlabelled videos. A semi-supervised method is then employed to train TAL models with such labels. We demonstrate the effectiveness of ProTAL through a usage scenario and a user study, providing insights into designing video programming framework.

* Accepted at CHI'25

Via

Access Paper or Ask Questions

ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis

Feb 25, 2024

Yuchen He, Zeqing Yuan, Yihong Wu, Liqi Cheng, Dazhen Deng, Yingcai Wu

Abstract:The immense popularity of racket sports has fueled substantial demand in tactical analysis with broadcast videos. However, existing manual methods require laborious annotation, and recent attempts leveraging video perception models are limited to low-level annotations like ball trajectories, overlooking tactics that necessitate an understanding of stroke techniques. State-of-the-art action segmentation models also struggle with technique recognition due to frequent occlusions and motion-induced blurring in racket sports videos. To address these challenges, We propose ViSTec, a Video-based Sports Technique recognition model inspired by human cognition that synergizes sparse visual data with rich contextual insights. Our approach integrates a graph to explicitly model strategic knowledge in stroke sequences and enhance technique recognition with contextual inductive bias. A two-stage action perception model is jointly trained to align with the contextual knowledge in the graph. Experiments demonstrate that our method outperforms existing models by a significant margin. Case studies with experts from the Chinese national table tennis team validate our model's capacity to automate analysis for technical actions and tactical strategies. More details are available at: https://ViSTec2024.github.io/.

* accepted by AAAI-24 Main Track

Via

Access Paper or Ask Questions