Abstract:The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 20%, 14.29%, and 17.65% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.
Abstract:On-device Deep Neural Networks (DNNs) have recently gained more attention due to the increasing computing power of the mobile devices and the number of applications in Computer Vision (CV), Natural Language Processing (NLP), and Internet of Things (IoTs). Unfortunately, the existing efficient convolutional neural network (CNN) architectures designed for CV tasks are not directly applicable to NLP tasks and the tiny Recurrent Neural Network (RNN) architectures have been designed primarily for IoT applications. In NLP applications, although model compression has seen initial success in on-device text classification, there are at least three major challenges yet to be addressed: adversarial robustness, explainability, and personalization. Here we attempt to tackle these challenges by designing a new training scheme for model compression and adversarial robustness, including the optimization of an explainable feature mapping objective, a knowledge distillation objective, and an adversarially robustness objective. The resulting compressed model is personalized using on-device private training data via fine-tuning. We perform extensive experiments to compare our approach with both compact RNN (e.g., FastGRNN) and compressed RNN (e.g., PRADO) architectures in both natural and adversarial NLP test settings.