Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Erik Chavez

ViTs for SITS: Vision Transformers for Satellite Image Time Series

Jan 26, 2023

Michail Tarasiou, Erik Chavez, Stefanos Zafeiriou

Figure 1 for ViTs for SITS: Vision Transformers for Satellite Image Time Series

Figure 2 for ViTs for SITS: Vision Transformers for Satellite Image Time Series

Figure 3 for ViTs for SITS: Vision Transformers for Satellite Image Time Series

Figure 4 for ViTs for SITS: Vision Transformers for Satellite Image Time Series

Abstract:In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model's discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes are made publicly available to facilitate further research.

* 11 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions