Abstract:When we are primarily interested in solving several problems jointly with a given prescribed high performance accuracy for each target application, then Foundation Models should for most cases be used rather than problem-specific models. We focus on the specific Computer Vision application of Foundation Models for Earth Observation (EO) and geospatial AI. These models can solve important problems we are tackling, including for example land cover classification, crop type mapping, flood segmentation, building density estimation, and road regression segmentation. In this paper, we show that for a limited number of labelled data, Foundation Models achieve improved performance compared to problem-specific models. In this work, we also present our proposed evaluation benchmark for Foundation Models for EO. Benchmarking the generalization performance of Foundation Models is important as it has become difficult to standardize a fair comparison across the many different models that have been proposed recently. We present the results using our evaluation benchmark for EO Foundation Models and show that Foundation Models are label efficient in the downstream tasks and help us solve problems we are tackling in EO and remote sensing.
Abstract:Confidence assessments of semantic segmentation algorithms in remote sensing are important. It is a desirable property of models to a priori know if they produce an incorrect output. Evaluations of the confidence assigned to the estimates of models for the task of classification in Earth Observation (EO) are crucial as they can be used to achieve improved semantic segmentation performance and prevent high error rates during inference and deployment. The model we develop, the Confidence Assessments of classification algorithms for Semantic segmentation (CAS) model, performs confidence evaluations at both the segment and pixel levels, and outputs both labels and confidence. The outcome of this work has important applications. The main application is the evaluation of EO Foundation Models on semantic segmentation downstream tasks, in particular land cover classification using satellite Copernicus Sentinel-2 data. The evaluation shows that the proposed model is effective and outperforms other alternative baseline models.
Abstract:Nowadays the accurate geo-localization of ground-view images has an important role across domains as diverse as journalism, forensics analysis, transports, and Earth Observation. This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The proposed method, Semantic Align Net (SAN), focuses on limited Field-of-View (FoV) and ground panorama images (images with a FoV of 360{\deg}). The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images. This work shows how SAN through semantic analysis of images improves the performance on the unlabelled CVUSA dataset for all the tested FoVs.
Abstract:Data from satellites or aerial vehicles are most of the times unlabelled. Annotating such data accurately is difficult, requires expertise, and is costly in terms of time. Even if Earth Observation (EO) data were correctly labelled, labels might change over time. Learning from unlabelled data within a semi-supervised learning framework for segmentation of aerial images is challenging. In this paper, we develop a new model for semantic segmentation of unlabelled images, the Non-annotated Earth Observation Semantic Segmentation (NEOS) model. NEOS performs domain adaptation as the target domain does not have ground truth semantic segmentation masks. The distribution inconsistencies between the target and source domains are due to differences in acquisition scenes, environment conditions, sensors, and times. Our model aligns the learned representations of the different domains to make them coincide. The evaluation results show that NEOS is successful and outperforms other models for semantic segmentation of unlabelled data.
Abstract:Massive amounts of unlabelled data are captured by Earth Observation (EO) satellites, with the Sentinel-2 constellation generating 1.6 TB of data daily. This makes Remote Sensing a data-rich domain well suited to Machine Learning (ML) solutions. However, a bottleneck in applying ML models to EO is the lack of annotated data as annotation is a labour-intensive and costly process. As a result, research in this domain has focused on Self-Supervised Learning and Foundation Model approaches. This paper addresses the need to evaluate different Foundation Models on a fair and uniform benchmark by introducing the PhilEO Bench, a novel evaluation framework for EO Foundation Models. The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset containing labels for three downstream tasks, building density estimation, road segmentation, and land cover classification. We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE, at multiple n-shots and convergence rates.
Abstract:Nowadays, classification and Out-of-Distribution (OoD) detection in the few-shot setting remain challenging aims due to rarity and the limited samples in the few-shot setting, and because of adversarial attacks. Accomplishing these aims is important for critical systems in safety, security, and defence. In parallel, OoD detection is challenging since deep neural network classifiers set high confidence to OoD samples away from the training data. To address such limitations, we propose the Few-shot ROBust (FROB) model for classification and few-shot OoD detection. We devise FROB for improved robustness and reliable confidence prediction for few-shot OoD detection. We generate the support boundary of the normal class distribution and combine it with few-shot Outlier Exposure (OE). We propose a self-supervised learning few-shot confidence boundary methodology based on generative and discriminative models. The contribution of FROB is the combination of the generated boundary in a self-supervised learning manner and the imposition of low confidence at this learned boundary. FROB implicitly generates strong adversarial samples on the boundary and forces samples from OoD, including our boundary, to be less confident by the classifier. FROB achieves generalization to unseen OoD with applicability to unknown, in the wild, test sets that do not correlate to the training datasets. To improve robustness, FROB redesigns OE to work even for zero-shots. By including our boundary, FROB reduces the threshold linked to the model's few-shot robustness; it maintains the OoD performance approximately independent of the number of few-shots. The few-shot robustness analysis evaluation of FROB on different sets and on One-Class Classification (OCC) data shows that FROB achieves competitive performance and outperforms benchmarks in terms of robustness to the outlier few-shot sample population and variability.
Abstract:Generative models trained in an unsupervised manner may set high likelihood and low reconstruction loss to Out-of-Distribution (OoD) samples. This increases Type II errors and leads to missed anomalies, overall decreasing Anomaly Detection (AD) performance. In addition, AD models underperform due to the rarity of anomalies. To address these limitations, we propose the OoD Minimum Anomaly Score GAN (OMASGAN). OMASGAN generates, in a negative data augmentation manner, anomalous samples on the estimated distribution boundary. These samples are then used to refine an AD model, leading to more accurate estimation of the underlying data distribution including multimodal supports with disconnected modes. OMASGAN performs retraining by including the abnormal minimum-anomaly-score OoD samples generated on the distribution boundary in a self-supervised learning manner. For inference, for AD, we devise a discriminator which is trained with negative and positive samples either generated (negative or positive) or real (only positive). OMASGAN addresses the rarity of anomalies by generating strong and adversarial OoD samples on the distribution boundary using only normal class data, effectively addressing mode collapse. A key characteristic of our model is that it uses any f-divergence distribution metric in its variational representation, not requiring invertibility. OMASGAN does not use feature engineering and makes no assumptions about the data distribution. The evaluation of OMASGAN on image data using the leave-one-out methodology shows that it achieves an improvement of at least 0.24 and 0.07 points in AUROC on average on the MNIST and CIFAR-10 datasets, respectively, over other benchmark and state-of-the-art models for AD.
Abstract:Generative Adversarial Networks (GAN) are a powerful methodology and can be used for unsupervised anomaly detection, where current techniques have limitations such as the accurate detection of anomalies near the tail of a distribution. GANs generally do not guarantee the existence of a probability density and are susceptible to mode collapse, while few GANs use likelihood to reduce mode collapse. In this paper, we create a GAN-based tail formation model for anomaly detection, the Tail of distribution GAN (TailGAN), to generate samples on the tail of the data distribution and detect anomalies near the support boundary. Using TailGAN, we leverage GANs for anomaly detection and use maximum entropy regularization. Using GANs that learn the probability of the underlying distribution has advantages in improving the anomaly detection methodology by allowing us to devise a generator for boundary samples, and use this model to characterize anomalies. TailGAN addresses supports with disjoint components and achieves competitive performance on images. We evaluate TailGAN for identifying Out-of-Distribution (OoD) data and its performance evaluated on MNIST, CIFAR-10, Baggage X-Ray, and OoD data shows competitiveness compared to methods from the literature.
Abstract:Generative models, such as Generative Adversarial Networks (GANs), have been used for unsupervised anomaly detection. While performance keeps improving, several limitations exist particularly attributed to difficulties at capturing multimodal supports and to the ability to approximate the underlying distribution closer to the tails, i.e. the boundary of the distribution's support. This paper proposes an approach that attempts to alleviate such shortcomings. We propose an invertible-residual-network-based model, the Boundary of Distribution Support Generator (BDSG). GANs generally do not guarantee the existence of a probability distribution and here, we use the recently developed Invertible Residual Network (IResNet) and Residual Flow (ResFlow), for density estimation. These models have not yet been used for anomaly detection. We leverage IResNet and ResFlow for Out-of-Distribution (OoD) sample detection and for sample generation on the boundary using a compound loss function that forces the samples to lie on the boundary. The BDSG addresses non-convex support, disjoint components, and multimodal distributions. Results on synthetic data and data from multimodal distributions, such as MNIST and CIFAR-10, demonstrate competitive performance compared to methods from the literature.