Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ilias Papastratis

Multi-manifold Attention for Vision Transformers

Jul 18, 2022

Dimitrios Konstantinidis, Ilias Papastratis, Kosmas Dimitropoulos, Petros Daras

Figure 1 for Multi-manifold Attention for Vision Transformers

Figure 2 for Multi-manifold Attention for Vision Transformers

Figure 3 for Multi-manifold Attention for Vision Transformers

Figure 4 for Multi-manifold Attention for Vision Transformers

Abstract:Vision Transformer are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although the performance of Vision Transformers have been greatly improved by employing Convolutional Neural Networks, hierarchical structures and compact forms, there is limited research on ways to utilize additional data representations to refine the attention map derived from the multi-head attention of a Transformer network. This work proposes a novel attention mechanism, called multi-manifold attention, that can substitute any standard attention mechanism in a Transformer-based network. The proposed attention models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, with different statistical and geometrical properties, guiding the network to take into consideration a rich set of information that describe the appearance, color and texture of an image, for the computation of a highly descriptive attention map. In this way, a Vision Transformer with the proposed attention is guided to become more attentive towards discriminative features, leading to improved classification results, as shown by the experimental results on several well-known image classification datasets.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Ablation study of self-supervised learning for image classification

Dec 04, 2021

Ilias Papastratis

Figure 1 for Ablation study of self-supervised learning for image classification

Figure 2 for Ablation study of self-supervised learning for image classification

Figure 3 for Ablation study of self-supervised learning for image classification

Figure 4 for Ablation study of self-supervised learning for image classification

Abstract:This project focuses on the self-supervised training of convolutional neural networks (CNNs) and transformer networks for the task of image recognition. A simple siamese network with different backbones is used in order to maximize the similarity of two augmented transformed images from the same source image. In this way, the backbone is able to learn visual information without supervision. Finally, the method is evaluated on three image recognition datasets.

Via

Access Paper or Ask Questions

A Comprehensive Study on Sign Language Recognition Methods

Jul 24, 2020

Nikolas Adaloglou, Theocharis Chatzis, Ilias Papastratis, Andreas Stergioulas, Georgios Th. Papadopoulos, Vassia Zacharopoulou, George J. Xydopoulos, Klimnis Atzakas, Dimitris Papazachariou, Petros Daras

Figure 1 for A Comprehensive Study on Sign Language Recognition Methods

Figure 2 for A Comprehensive Study on Sign Language Recognition Methods

Figure 3 for A Comprehensive Study on Sign Language Recognition Methods

Figure 4 for A Comprehensive Study on Sign Language Recognition Methods

Abstract:In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where sentence and gloss level annotations are provided for a video capture.

Via

Access Paper or Ask Questions