Abstract:Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this context, great efforts have been made to build foundation models to test their capabilities in broader applications, and examples include Prithvi by NASA-IBM, Segment-Anything-Model, ViT, etc. This leads to an important question: Are foundation models always a suitable choice for different remote sensing tasks, and when or when not? This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification using multispectral imagery at moderate resolution, through comparisons with traditional machine learning (ML) and regular-size deep learning models. Interestingly, the results reveal that in many scenarios traditional ML models still have similar or better performance compared to foundation models, especially for tasks where texture is less useful for classification. On the other hand, deep learning models did show more promising results for tasks where labels partially depend on texture (e.g., burn scar), while the difference in performance between foundation models and deep learning models is not obvious. The results conform with our analysis: The suitability of foundation models depend on the alignment between the self-supervised learning tasks and the real downstream tasks, and the typical masked autoencoder paradigm is not necessarily suitable for many remote sensing problems.
Abstract:Crop type classification using satellite observations is an important tool for providing insights about planted area and enabling estimates of crop condition and yield, especially within the growing season when uncertainties around these quantities are highest. As the climate changes and extreme weather events become more frequent, these methods must be resilient to changes in domain shifts that may occur, for example, due to shifts in planting timelines. In this work, we present an approach for within-season crop type classification using moderate spatial resolution (30 m) satellite data that addresses domain shift related to planting timelines by normalizing inputs by crop growth stage. We use a neural network leveraging both convolutional and recurrent layers to predict if a pixel contains corn, soybeans, or another crop or land cover type. We evaluated this method for the 2019 growing season in the midwestern US, during which planting was delayed by as much as 1-2 months due to extreme weather that caused record flooding. We show that our approach using growth stage-normalized time series outperforms fixed-date time series, and achieves overall classification accuracy of 85.4% prior to harvest (September-November) and 82.8% by mid-season (July-September).