Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Labina Shrestha

3D Convolutional with Attention for Action Recognition

Jun 05, 2022

Labina Shrestha, Shikha Dubey, Farrukh Olimov, Muhammad Aasim Rafique, Moongu Jeon

Abstract:Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels and optical flow separately, models using a two-stream fusion technique, and models consisting of both convolutional neural network (CNN) and long-short term memory (LSTM) network are few examples of such complex models. Moreover, fine-tuning such complex models is computationally expensive as well. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected (FC) layers, and attention layer, which is simpler to implement and gives a competitive performance on the UCF-101 dataset. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the attention mechanism helps the model to locate attention to essential features for recognition.

Via

Access Paper or Ask Questions

Image Captioning using Multiple Transformers for Self-Attention Mechanism

Feb 14, 2021

Farrukh Olimov, Shikha Dubey, Labina Shrestha, Tran Trung Tin, Moongu Jeon

Figure 1 for Image Captioning using Multiple Transformers for Self-Attention Mechanism

Figure 2 for Image Captioning using Multiple Transformers for Self-Attention Mechanism

Figure 3 for Image Captioning using Multiple Transformers for Self-Attention Mechanism

Figure 4 for Image Captioning using Multiple Transformers for Self-Attention Mechanism

Abstract:Real-time image captioning, along with adequate precision, is the main challenge of this research field. The present work, Multiple Transformers for Self-Attention Mechanism (MTSM), utilizes multiple transformers to address these problems. The proposed algorithm, MTSM, acquires region proposals using a transformer detector (DETR). Consequently, MTSM achieves the self-attention mechanism by transferring these region proposals and their visual and geometrical features through another transformer and learns the objects' local and global interconnections. The qualitative and quantitative results of the proposed algorithm, MTSM, are shown on the MSCOCO dataset.

Via

Access Paper or Ask Questions