Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Swetava Ganguli

SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets

Sep 26, 2023

Daria Reshetova, Swetava Ganguli, C. V. Krishnakumar Iyer, Vipul Pandey

Abstract:We propose a Self-supervised Anomaly Detection technique, called SeMAnD, to detect geometric anomalies in Multimodal geospatial datasets. Geospatial data comprises of acquired and derived heterogeneous data modalities that we transform to semantically meaningful, image-like tensors to address the challenges of representation, alignment, and fusion of multimodal data. SeMAnD is comprised of (i) a simple data augmentation strategy, called RandPolyAugment, capable of generating diverse augmentations of vector geometries, and (ii) a self-supervised training objective with three components that incentivize learning representations of multimodal data that are discriminative to local changes in one modality which are not corroborated by the other modalities. Detecting local defects is crucial for geospatial anomaly detection where even small anomalies (e.g., shifted, incorrectly connected, malformed, or missing polygonal vector geometries like roads, buildings, landcover, etc.) are detrimental to the experience and safety of users of geospatial applications like mapping, routing, search, and recommendation systems. Our empirical study on test sets of different types of real-world geometric geospatial anomalies across 3 diverse geographical regions demonstrates that SeMAnD is able to detect real-world defects and outperforms domain-agnostic anomaly detection strategies by 4.8-19.7% as measured using anomaly classification AUC. We also show that model performance increases (i) up to 20.4% as the number of input modalities increase and (ii) up to 22.9% as the diversity and strength of training data augmentations increase.

* Extended version of the accepted research track paper at the 31st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2023), Hamburg, Germany. 11 pages, 8 figures, 6 tables

Via

Access Paper or Ask Questions

Self-Supervised Temporal Analysis of Spatiotemporal Data

Apr 25, 2023

Yi Cao, Swetava Ganguli, Vipul Pandey

Abstract:There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to stratify landscape based on mobility activity time series. First, the time series signal is transformed to the frequency domain and then compressed into task-agnostic temporal embeddings by a contractive autoencoder, which preserves cyclic temporal patterns observed in time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling of downstream geospatial tasks using deep semantic segmentation. Experiments show that temporal embeddings are semantically meaningful representations of time series data and are effective across different tasks such as classifying residential area and commercial areas.

* Accepted for oral presentation at the 43rd IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023, Pasadena, California. 4 pages and 7 figures

Via

Access Paper or Ask Questions

Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

Oct 07, 2022

Swetava Ganguli, C. V. Krishnakumar Iyer, Vipul Pandey

Figure 1 for Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

Figure 2 for Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

Abstract:Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this work, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth's surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian paths on these nodes. A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries, of the spatial connectivity patterns between tiles and their neighbors implied by the observed Markovian paths. A convolutional, contractive autoencoder is trained to learn compressed representations, called reachability embeddings, of reachability summaries for every tile. Reachability embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems, we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance, as measured using area under the precision-recall curve (AUPRC) metric, when compared to baseline models that use pixel representations that do not account for the spatial connectivity between tiles. Reachability embeddings transform sequential, spatiotemporal mobility data into semantically meaningful tensor representations that can be combined with other sources of imagery and are designed to facilitate multimodal learning in geospatial computer vision.

* Extended abstract accepted for presentation at BayLearn 2022. 3 pages, 2 figures, 1 table. Abstract based on IEEE MDM 2022 research track paper: arXiv:2110.12521

Via

Access Paper or Ask Questions

Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Oct 24, 2021

Swetava Ganguli, C. V. Krishnakumar Iyer, Vipul Pandey

Figure 1 for Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Figure 2 for Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Figure 3 for Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Figure 4 for Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Abstract:Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this paper, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth's surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian paths on these nodes. A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries, of the spatial connectivity patterns between tiles and their neighbors implied by the observed Markovian paths. A convolutional, contractive autoencoder is trained to learn compressed representations, called reachability embeddings, of reachability summaries for every tile. Reachability embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems, we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance, while using upto 67% less trajectory data, as measured using area under the precision-recall curve (AUPRC) metric, when compared to baseline models that use pixel representations that do not account for the spatial connectivity between tiles. Reachability embeddings transform sequential, spatiotemporal mobility data into semantically meaningful image-like representations that can be combined with other sources of imagery and are designed to facilitate multimodal learning in geospatial computer vision.

* 14 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Conditional Generation of Synthetic Geospatial Images from Pixel-level and Feature-level Inputs

Sep 11, 2021

Xuerong Xiao, Swetava Ganguli, Vipul Pandey

Figure 1 for Conditional Generation of Synthetic Geospatial Images from Pixel-level and Feature-level Inputs

Figure 2 for Conditional Generation of Synthetic Geospatial Images from Pixel-level and Feature-level Inputs

Abstract:Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatiotemporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.

* Extended abstract accepted for presentation at BayLearn 2021. 3 pages, 2 figures

Via

Access Paper or Ask Questions

Trinity: A No-Code AI platform for complex spatial datasets

Jul 01, 2021

C. V. Krishnakumar Iyer, Feili Hou, Henry Wang, Yonghong Wang, Kay Oh, Swetava Ganguli, Vipul Pandey

Figure 1 for Trinity: A No-Code AI platform for complex spatial datasets

Figure 2 for Trinity: A No-Code AI platform for complex spatial datasets

Figure 3 for Trinity: A No-Code AI platform for complex spatial datasets

Figure 4 for Trinity: A No-Code AI platform for complex spatial datasets

Abstract:We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal datasets to make them consumable by standard deep learning models, in this case, Convolutional Neural Networks (CNNs), and giving the ability to formulate disparate problems in a standard way, eg. semantic segmentation. With an intuitive user interface, a feature store that hosts derivatives of complex feature engineering, a deep learning kernel, and a scalable data processing mechanism, Trinity provides a powerful platform for domain experts to share the stage with scientists and engineers in solving business-critical problems. It enables quick prototyping, rapid experimentation and reduces the time to production by standardizing model building and deployment. In this paper, we present our motivation behind Trinity and its design along with showcasing sample applications to motivate the idea of lowering the bar to using AI.

* 12 pages

Via

Access Paper or Ask Questions

VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Dec 08, 2020

Xuerong Xiao, Swetava Ganguli, Vipul Pandey

Figure 1 for VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Figure 2 for VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Figure 3 for VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Figure 4 for VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Abstract:Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. An interpretation of the attribute vector to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.

* In Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science, Article No. 1, Pages 1-10, November 03, 2020, Seattle, WA, USA
* 10 pages, 4 figures, Peer-reviewed and accepted version of the paper published at the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science (IWCTS 2020)

Via

Access Paper or Ask Questions

GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images

Feb 14, 2019

Swetava Ganguli, Pedro Garzon, Noa Glaser

Figure 1 for GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images

Figure 2 for GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images

Figure 3 for GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images

Figure 4 for GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images

Abstract:Automatically generating maps from satellite images is an important task. There is a body of literature which tries to address this challenge. We created a more expansive survey of the task by experimenting with different models and adding new loss functions to improve results. We created a database of pairs of satellite images and the corresponding map of the area. Our model translates the satellite image to the corresponding standard layer map image using three main model architectures: (i) a conditional Generative Adversarial Network (GAN) which compresses the images down to a learned embedding, (ii) a generator which is trained as a normalizing flow (RealNVP) model, and (iii) a conditional GAN where the generator translates via a series of convolutions to the standard layer of a map and the discriminator input is the concatenation of the real/generated map and the satellite image. Model (iii) was by far the most promising of three models. To improve the results we also added a reconstruction loss and style transfer loss in addition to the GAN losses. The third model architecture produced the best quality of sampled images. In contrast to the other generative model where evaluation of the model is a challenging problem. since we have access to the real map for a given satellite image, we are able to assign a quantitative metric to the quality of the generated images in addition to inspecting them visually. While we are continuing to work on increasing the accuracy of the model, one challenge has been the coarse resolution of the data which upper-bounds the quality of the results of our model. Nevertheless, as will be seen in the results, the generated map is more accurate in the features it produces since the generator architecture demands a pixel-wise image translation/pixel-wise coloring. A video presentation summarizing this paper is available at: https://youtu.be/Ur0flOX-Ji0

Via

Access Paper or Ask Questions

Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty

Feb 13, 2019

Anthony Perez, Swetava Ganguli, Stefano Ermon, George Azzari, Marshall Burke, David Lobell

Figure 1 for Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty

Figure 2 for Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty

Figure 3 for Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty

Figure 4 for Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty

Abstract:Obtaining reliable data describing local poverty metrics at a granularity that is informative to policy-makers requires expensive and logistically difficult surveys, particularly in the developing world. Not surprisingly, the poverty stricken regions are also the ones which have a high probability of being a war zone, have poor infrastructure and sometimes have governments that do not cooperate with internationally funded development efforts. We train a CNN on free and publicly available daytime satellite images of the African continent from Landsat 7 to build a model for predicting local economic livelihoods. Only 5% of the satellite images can be associated with labels (which are obtained from DHS Surveys) and thus a semi-supervised approach using a GAN (similar to the approach of Salimans, et al. (2016)), albeit with a more stable-to-train flavor of GANs called the Wasserstein GAN regularized with gradient penalty(Gulrajani, et al. (2017)) is used. The method of multitask learning is employed to regularize the network and also create an end-to-end model for the prediction of multiple poverty metrics.

* This project was recognized as the best two-person project during the Spring 2017 offering of CS 231N Convolutional Neural Networks for Visual Recognition. Final report of research project conducted by the authors as part of the Sustainability and Artificial Intelligence Laboratory at Stanford University and as part of the Spring 2017 offering of CS 231N Convolutional Neural Networks for Visual Recognition

Via

Access Paper or Ask Questions

Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities

Feb 13, 2019

Jared Dunnmon, Swetava Ganguli, Darren Hau, Brooke Husic

Figure 1 for Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities

Figure 2 for Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities

Figure 3 for Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities

Figure 4 for Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities

Abstract:The ability to obtain accurate food security metrics in developing areas where relevant data can be sparse is critically important for policy makers tasked with implementing food aid programs. As a result, a great deal of work has been dedicated to predicting important food security metrics such as annual crop yields using a variety of methods including simulation, remote sensing, weather models, and human expert input. As a complement to existing techniques in crop yield prediction, this work develops neural network models for predicting the sentiment of Twitter feeds from farming communities. Specifically, we investigate the potential of both direct learning on a small dataset of agriculturally-relevant tweets and transfer learning from larger, well-labeled sentiment datasets from other domains (e.g.~politics) to accurately predict agricultural sentiment, which we hope would ultimately serve as a useful crop yield predictor. We find that direct learning from small, relevant datasets outperforms transfer learning from large, fully-labeled datasets, that convolutional neural networks broadly outperform recurrent neural networks on Twitter sentiment classification, and that these models perform substantially less well on ternary sentiment problems characteristic of practical settings than on binary problems often found in the literature.

Via

Access Paper or Ask Questions