Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Estefanía Talavera

CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Feb 06, 2024

Adjorn van Engelenhoven, Nicola Strisciuglio, Estefanía Talavera

Figure 1 for CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Figure 2 for CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Figure 3 for CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Figure 4 for CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Abstract:The Transformer architecture has shown to be a powerful tool for a wide range of tasks. It is based on the self-attention mechanism, which is an inherently computationally expensive operation with quadratic computational complexity: memory usage and compute time increase quadratically with the length of the input sequences, thus limiting the application of Transformers. In this work, we propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention computation and achieve efficient transformers. CAST utilizes learnable surrogate tokens to construct a cluster affinity matrix, used to cluster the input sequence and generate novel cluster summaries. The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence. CAST improves efficiency by reducing the complexity from $O(N^2)$ to $O(\alpha N)$ where N is the sequence length, and {\alpha} is constant according to the number of clusters and samples per cluster. We show that CAST performs better than or comparable to the baseline Transformers on long-range sequence modeling tasks, while also achieving higher results on time and memory efficiency than other efficient transformers.

Via

Access Paper or Ask Questions

HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Jul 31, 2021

Kayleigh Boekhoudt, Alina Matei, Maya Aghaei, Estefanía Talavera

Figure 1 for HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Figure 2 for HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Figure 3 for HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Figure 4 for HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Abstract:The automatic detection of anomalies captured by surveillance settings is essential for speeding the otherwise laborious approach. To date, UCF-Crime is the largest available dataset for automatic visual analysis of anomalies and consists of real-world crime scenes of various categories. In this paper, we introduce HR-Crime, a subset of the UCF-Crime dataset suitable for human-related anomaly detection tasks. We rely on state-of-the-art techniques to build the feature extraction pipeline for human-related anomaly detection. Furthermore, we present the baseline anomaly detection analysis on the HR-Crime. HR-Crime as well as the developed feature extraction pipeline and the extracted features will be publicly available for further research in the field.

* Accepted by CAIP 2021

Via

Access Paper or Ask Questions

Visual Summary of Egocentric Photostreams by Representative Keyframes

May 06, 2015

Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto, Petia Radeva

Figure 1 for Visual Summary of Egocentric Photostreams by Representative Keyframes

Figure 2 for Visual Summary of Egocentric Photostreams by Representative Keyframes

Figure 3 for Visual Summary of Egocentric Photostreams by Representative Keyframes

Figure 4 for Visual Summary of Egocentric Photostreams by Representative Keyframes

Abstract:Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this paper, we propose a new summarization method based on keyframes selection that uses visual features extracted by means of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the summaries.

* Paper accepted in the IEEE First International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX). Turin, Italy. July 3, 2015

Via

Access Paper or Ask Questions