Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guodong Ding

Condensing Action Segmentation Datasets via Generative Network Inversion

Mar 18, 2025

Guodong Ding, Rongyu Chen, Angela Yao

Abstract:This work presents the first condensation approach for procedural video datasets used in temporal action segmentation. We propose a condensation framework that leverages generative prior learned from the dataset and network inversion to condense data into compact latent codes with significant storage reduced across temporal and channel aspects. Orthogonally, we propose sampling diverse and representative action sequences to minimize video-wise redundancy. Our evaluation on standard benchmarks demonstrates consistent effectiveness in condensing TAS datasets and achieving competitive performances. Specifically, on the Breakfast dataset, our approach reduces storage by over 500$\times$ while retaining 83% of the performance compared to training with the full dataset. Furthermore, when applied to a downstream incremental learning task, it yields superior performance compared to the state-of-the-art.

* 10 pages, 3 figures, 5 tables, Accepted to CVPR2025

Via

Access Paper or Ask Questions

OnlineTAS: An Online Baseline for Temporal Action Segmentation

Nov 02, 2024

Qing Zhong, Guodong Ding, Angela Yao

Abstract:Temporal context plays a significant role in temporal action segmentation. In an offline setting, the context is typically captured by the segmentation network after observing the entire sequence. However, capturing and using such context information in an online setting remains an under-explored problem. This work presents the an online framework for temporal action segmentation. At the core of the framework is an adaptive memory designed to accommodate dynamic changes in context over time, alongside a feature augmentation module that enhances the frames with the memory. In addition, we propose a post-processing approach to mitigate the severe over-segmentation in the online setting. On three common segmentation benchmarks, our approach achieves state-of-the-art performance.

* 16 pages, 4 figures, 12 tables. Accepted to NeurIPS 2024

Via

Access Paper or Ask Questions

Coherent Temporal Synthesis for Incremental Action Segmentation

Mar 10, 2024

Guodong Ding, Hans Golong, Angela Yao

Abstract:Data replay is a successful incremental learning technique for images. It prevents catastrophic forgetting by keeping a reservoir of previous data, original or synthesized, to ensure the model retains past knowledge while adapting to novel concepts. However, its application in the video domain is rudimentary, as it simply stores frame exemplars for action recognition. This paper presents the first exploration of video data replay techniques for incremental action segmentation, focusing on action temporal modeling. We propose a Temporally Coherent Action (TCA) model, which represents actions using a generative model instead of storing individual frames. The integration of a conditioning variable that captures temporal coherence allows our model to understand the evolution of action features over time. Therefore, action segments generated by TCA for replay are diverse and temporally coherent. In a 10-task incremental setup on the Breakfast dataset, our approach achieves significant increases in accuracy for up to 22% compared to the baselines.

* 10 pages, 6 figures, 5 tables, accepted to CVPR 2024

Via

Access Paper or Ask Questions

Every Mistake Counts in Assembly

Jul 31, 2023

Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao

Figure 1 for Every Mistake Counts in Assembly

Figure 2 for Every Mistake Counts in Assembly

Figure 3 for Every Mistake Counts in Assembly

Figure 4 for Every Mistake Counts in Assembly

Abstract:One promising use case of AI assistants is to help with complex procedures like cooking, home repair, and assembly tasks. Can we teach the assistant to interject after the user makes a mistake? This paper targets the problem of identifying ordering mistakes in assembly procedures. We propose a system that can detect ordering mistakes by utilizing a learned knowledge base. Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes. Spatial beliefs depict the topological relationship of the assembling components, while temporal beliefs aggregate prerequisite actions as ordering constraints. With an episodic memory design, our algorithm can dynamically update and construct the belief sets as more actions are observed, all in an online fashion. We demonstrate experimentally that our inferred spatial and temporal beliefs are capable of identifying incorrect orderings in real-world action sequences. To construct the spatial beliefs, we collect a new set of coarse-level action annotations for Assembly101 based on the positioning of the toy parts. Finally, we demonstrate the superior performance of our belief inference algorithm in detecting ordering mistakes on the Assembly101 dataset.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Temporal Action Segmentation: An Analysis of Modern Technique

Oct 19, 2022

Guodong Ding, Fadime Sener, Angela Yao

Figure 1 for Temporal Action Segmentation: An Analysis of Modern Technique

Figure 2 for Temporal Action Segmentation: An Analysis of Modern Technique

Figure 3 for Temporal Action Segmentation: An Analysis of Modern Technique

Figure 4 for Temporal Action Segmentation: An Analysis of Modern Technique

Abstract:Temporal action segmentation from videos aims at the dense labeling of video frames with multiple action classes in minutes-long videos. Categorized as a long-range video understanding task, researchers have proposed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid development of action segmentation techniques in recent years, there has been no systematic survey in such fields. To this end, in this survey, we analyze and summarize the main contributions and trends for this task. Specifically, we first examine the task definition, common benchmarks, types of supervision, and popular evaluation measures. Furthermore, we systematically investigate two fundamental aspects of this topic, i.e., frame representation and temporal modeling, which are widely and extensively studied in the literature. We then comprehensively review existing temporal action segmentation works, each categorized by their form of supervision. Finally, we conclude our survey by highlighting and identifying several open topics for research. In addition, we supplement our survey with a curated list of temporal action segmentation resources, which is available at https://github.com/atlas-eccv22/awesome-temporal-action-segmentation.

* 26 pages, 10 figures, 9 tables

Via

Access Paper or Ask Questions

Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Jul 21, 2022

Guodong Ding, Angela Yao

Figure 1 for Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Figure 2 for Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Figure 3 for Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Figure 4 for Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Abstract:We present a semi-supervised learning approach to the temporal action segmentation task. The goal of the task is to temporally detect and segment actions in long, untrimmed procedural videos, where only a small set of videos are densely labelled, and a large collection of videos are unlabelled. To this end, we propose two novel loss functions for the unlabelled data: an action affinity loss and an action continuity loss. The action affinity loss guides the unlabelled samples learning by imposing the action priors induced from the labelled set. Action continuity loss enforces the temporal continuity of actions, which also provides frame-wise classification supervision. In addition, we propose an Adaptive Boundary Smoothing (ABS) approach to build coarser action boundaries for more robust and reliable learning. The proposed loss functions and ABS were evaluated on three benchmarks. Results show that they significantly improved action segmentation performance with a low amount (5% and 10%) of labelled data and achieved comparable results to full supervision with 50% labelled data. Furthermore, ABS succeeded in boosting performance when integrated into fully-supervised learning.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Rapid Person Re-Identification via Sub-space Consistency Regularization

Jul 13, 2022

Qingze Yin, Guanan Wang, Guodong Ding, Qilei Li, Shaogang Gong, Zhenmin Tang

Figure 1 for Rapid Person Re-Identification via Sub-space Consistency Regularization

Figure 2 for Rapid Person Re-Identification via Sub-space Consistency Regularization

Figure 3 for Rapid Person Re-Identification via Sub-space Consistency Regularization

Figure 4 for Rapid Person Re-Identification via Sub-space Consistency Regularization

Abstract:Person Re-Identification (ReID) matches pedestrians across disjoint cameras. Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation as well as complex quick-sort algorithms. Recently, some works propose to yield binary encoded person descriptors which instead only require fast Hamming distance computation and simple counting-sort algorithms. However, the performances of such binary encoded descriptors, especially with short code (e.g., 32 and 64 bits), are hardly satisfactory given the sparse binary space. To strike a balance between the model accuracy and efficiency, we propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by $0.25$ times than real-value features under the same dimensions whilst maintaining a competitive accuracy, especially under short codes. SCR transforms real-value features vector (e.g., 2048 float32) with short binary codes (e.g., 64 bits) by first dividing real-value features vector into $M$ sub-spaces, each with $C$ clustered centroids. Thus the distance between two samples can be expressed as the summation of the respective distance to the centroids, which can be sped up by offline calculation and maintained via a look-up table. On the other side, these real-value centroids help to achieve significantly higher accuracy than using binary code. Lastly, we convert the distance look-up table to be integer and apply the counting-sort algorithm to speed up the ranking stage. We also propose a novel consistency regularization with an iterative framework. Experimental results on Market-1501 and DukeMTMC-reID show promising and exciting results. Under short code, our proposed SCR enjoys Real-value-level accuracy and Hashing-level speed.

Via

Access Paper or Ask Questions

Temporal Action Segmentation with High-level Complex Activity Labels

Aug 15, 2021

Guodong Ding, Angela Yao

Figure 1 for Temporal Action Segmentation with High-level Complex Activity Labels

Figure 2 for Temporal Action Segmentation with High-level Complex Activity Labels

Figure 3 for Temporal Action Segmentation with High-level Complex Activity Labels

Figure 4 for Temporal Action Segmentation with High-level Complex Activity Labels

Abstract:Over the past few years, the success in action recognition on short trimmed videos has led more investigations towards the temporal segmentation of actions in untrimmed long videos. Recently, supervised approaches have achieved excellent performance in segmenting complex human actions in untrimmed videos. However, besides action labels, such approaches also require the start and end points of each action, which is expensive and tedious to collect. In this paper, we aim to learn the action segments taking only the high-level activity labels as input. Under the setting where no action-level supervision is provided, Hungarian matching is often used to find the mapping between segments and ground truth actions to evaluate the model and report the performance. On the one hand, we show that with the high-level supervision, we are able to generalize the Hungarian matching settings from the current video and activity level to the global level. The extended global-level matching allows for the shared actions across activities. On the other hand, we propose a novel action discovery framework that automatically discovers constituent actions in videos with the activity classification task. Specifically, we define a finite number of prototypes to form a dual representation of a video sequence. These collectively learned prototypes are considered discovered actions. This classification setting endows our approach the capability of discovering potentially shared actions across multiple complex activities. Extensive experiments demonstrate that the discovered actions are helpful in performing temporal action segmentation and activity recognition.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Jun 04, 2019

Guodong Ding, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli

Figure 1 for Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Figure 2 for Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Figure 3 for Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Figure 4 for Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Abstract:Person re-identification aims to establish the correct identity correspondences of a person moving through a non-overlapping multi-camera installation. Recent advances based on deep learning models for this task mainly focus on supervised learning scenarios where accurate annotations are assumed to be available for each setup. Annotating large scale datasets for person re-identification is demanding and burdensome, which renders the deployment of such supervised approaches to real-world applications infeasible. Therefore, it is necessary to train models without explicit supervision in an autonomous manner. In this paper, we propose an elegant and practical clustering approach for unsupervised person re-identification based on the cluster validity consideration. Concretely, we explore a fundamental concept in statistics, namely \emph{dispersion}, to achieve a robust clustering criterion. Dispersion reflects the compactness of a cluster when employed at the intra-cluster level and reveals the separation when measured at the inter-cluster level. With this insight, we design a novel Dispersion-based Clustering (DBC) approach which can discover the underlying patterns in data. This approach considers a wider context of sample-level pairwise relationships to achieve a robust cluster affinity assessment which handles the complications may arise due to prevalent imbalanced data distributions. Additionally, our solution can automatically prioritize standalone data points and prevents inferior clustering. Our extensive experimental analysis on image and video re-identification benchmarks demonstrate that our method outperforms the state-of-the-art unsupervised methods by a significant margin. Code is available at https://github.com/gddingcs/Dispersion-based-Clustering.git.

* 10 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

May 16, 2018

Guodong Ding, Shanshan Zhang, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli

Figure 1 for Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

Figure 2 for Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

Figure 3 for Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

Figure 4 for Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

Abstract:Person re-identification aims to match a person's identity across multiple camera streams. Deep neural networks have been successfully applied to the challenging person re-identification task. One remarkable bottleneck is that the existing deep models are data hungry and require large amounts of labeled training data. Acquiring manual annotations for pedestrian identity matchings in large-scale surveillance camera installations is a highly cumbersome task. Here, we propose the first semi-supervised approach that performs pseudo-labeling by considering complex relationships between unlabeled and labeled training samples in the feature space. Our approach first approximates the actual data manifold by learning a generative model via adversarial training. Given the trained model, data augmentation can be performed by generating new synthetic data samples which are unlabeled. An open research problem is how to effectively use this additional data for improved feature learning. To this end, this work proposes a novel Feature Affinity based Pseudo-Labeling (FAPL) approach with two possible label encodings under a unified setting. Our approach measures the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks. FAPL trains with the joint supervision of cross-entropy loss together with a center regularization term, which not only ensures discriminative feature representation learning but also simultaneously predicts pseudo-labels for unlabeled data. Our extensive experiments on two standard large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate significant performance boosts over closely related competitors and outperforms state-of-the-art person re-identification techniques in most cases.

* 12 pages, 4 figures, 9 tables

Via

Access Paper or Ask Questions