Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Murad Popattia

Learning by Aligning 2D Skeleton Sequences in Time

May 31, 2023

Quoc-Huy Tran, Muhammad Ahmed, Ahmed Mehmood, M. Hassan Ahmed, Murad Popattia, Andrey Konin, M. Zeeshan Zia

Figure 1 for Learning by Aligning 2D Skeleton Sequences in Time

Figure 2 for Learning by Aligning 2D Skeleton Sequences in Time

Figure 3 for Learning by Aligning 2D Skeleton Sequences in Time

Figure 4 for Learning by Aligning 2D Skeleton Sequences in Time

Abstract:This paper presents a novel self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton coordinates are taken directly as input, our key idea is to use sequences of 2D skeleton heatmaps as input. Given 2D skeleton heatmaps, we utilize a video transformer which performs self-attention in the spatial and temporal domains for extracting effective spatiotemporal and contextual features. In addition, we introduce simple heatmap augmentation techniques based on 2D skeletons for self-supervised learning. Despite the lack of 3D information, our approach achieves not only higher accuracy but also better robustness against missing and noisy keypoints than CASA. Extensive evaluations on three public datasets, i.e., Penn Action, IKEA ASM, and H2O, demonstrate that our approach outperforms previous methods in different fine-grained human activity understanding tasks, i.e., phase classification, phase progression, video alignment, and fine-grained frame retrieval.

Via

Access Paper or Ask Questions

Guiding Attention using Partial-Order Relationships for Image Captioning

Apr 15, 2022

Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz

Figure 1 for Guiding Attention using Partial-Order Relationships for Image Captioning

Figure 2 for Guiding Attention using Partial-Order Relationships for Image Captioning

Figure 3 for Guiding Attention using Partial-Order Relationships for Image Captioning

Figure 4 for Guiding Attention using Partial-Order Relationships for Image Captioning

Abstract:The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.

* Accepted at CVPRW

Via

Access Paper or Ask Questions