Abstract:Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
Abstract:A high-precision feature extraction model is crucial for change detection (CD). In the past, many deep learning-based supervised CD methods learned to recognize change feature patterns from a large number of labelled bi-temporal images, whereas labelling bi-temporal remote sensing images is very expensive and often time-consuming; therefore, we propose a coarse-to-fine semi-supervised CD method based on consistency regularization (C2F-SemiCD), which includes a coarse-to-fine CD network with a multiscale attention mechanism (C2FNet) and a semi-supervised update method. Among them, the C2FNet network gradually completes the extraction of change features from coarse-grained to fine-grained through multiscale feature fusion, channel attention mechanism, spatial attention mechanism, global context module, feature refine module, initial aggregation module, and final aggregation module. The semi-supervised update method uses the mean teacher method. The parameters of the student model are updated to the parameters of the teacher Model by using the exponential moving average (EMA) method. Through extensive experiments on three datasets and meticulous ablation studies, including crossover experiments across datasets, we verify the significant effectiveness and efficiency of the proposed C2F-SemiCD method. The code will be open at: https://github.com/ChengxiHAN/C2F-SemiCDand-C2FNet.
Abstract:Benefiting from the developments in deep learning technology, deep-learning-based algorithms employing automatic feature extraction have achieved remarkable performance on the change detection (CD) task. However, the performance of existing deep-learning-based CD methods is hindered by the imbalance between changed and unchanged pixels. To tackle this problem, a progressive foreground-balanced sampling strategy on the basis of not adding change information is proposed in this article to help the model accurately learn the features of the changed pixels during the early training process and thereby improve detection performance.Furthermore, we design a discriminative Siamese network, hierarchical attention network (HANet), which can integrate multiscale features and refine detailed features. The main part of HANet is the HAN module, which is a lightweight and effective self-attention mechanism. Extensive experiments and ablation studies on two CDdatasets with extremely unbalanced labels validate the effectiveness and efficiency of the proposed method.
Abstract:The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficient expression problem of change features in the conventional U-Net structure adopted in previous methods, which causes inaccurate edge detection and internal holes. Change maps from deep features with rich semantic information are generated and used as prior information to guide multi-scale feature fusion, which can improve the expression ability of change features. Meanwhile, we propose a self-attention module named Change Guide Module (CGM), which can effectively capture the long-distance dependency among pixels and effectively overcome the problem of the insufficient receptive field of traditional convolutional neural networks. On four major CD datasets, we verify the usefulness and efficiency of the CGNet, and a large number of experiments and ablation studies demonstrate the effectiveness of CGNet. We're going to open-source our code at https://github.com/ChengxiHAN/CGNet-CD.
Abstract:High spectral resolution imagery of the Earth's surface enables users to monitor changes over time in fine-grained scale, playing an increasingly important role in agriculture, defense, and emergency response. However, most current algorithms are still confined to describing local features and fail to incorporate a global perspective, which limits their ability to capture interactions between global features, thus usually resulting in incomplete change regions. In this paper, we propose a Global Multi-head INteractive self-attention change Detection network (GlobalMind) to explore the implicit correlation between different surface objects and variant land cover transformations, acquiring a comprehensive understanding of the data and accurate change detection result. Firstly, a simple but effective Global Axial Segmentation (GAS) strategy is designed to expand the self-attention computation along the row space or column space of hyperspectral images, allowing the global connection with high efficiency. Secondly, with GAS, the global spatial multi-head interactive self-attention (Global-M) module is crafted to mine the abundant spatial-spectral feature involving potential correlations between the ground objects from the entire rich and complex hyperspectral space. Moreover, to acquire the accurate and complete cross-temporal changes, we devise a global temporal interactive multi-head self-attention (GlobalD) module which incorporates the relevance and variation of bi-temporal spatial-spectral features, deriving the integrate potential same kind of changes in the local and global range with the combination of GAS. We perform extensive experiments on five mostly used hyperspectral datasets, and our method outperforms the state-of-the-art algorithms with high accuracy and efficiency.
Abstract:Hyperspectral change detection plays an essential role of monitoring the dynamic urban development and detecting precise fine object evolution and alteration. In this paper, we have proposed an original Efficient Multi-temporal Self-attention Network (EMS-Net) for hyperspectral change detection. The designed EMS module cuts redundancy of those similar and containing-no-changes feature maps, computing efficient multi-temporal change information for precise binary change map. Besides, to explore the clustering characteristics of the change detection, a novel supervised contrastive loss is provided to enhance the compactness of the unchanged. Experiments implemented on two hyperspectral change detection datasets manifests the out-standing performance and validity of proposed method.
Abstract:The fast development of self-supervised learning lowers the bar learning feature representation from massive unlabeled data and has triggered a series of research on change detection of remote sensing images. Challenges in adapting self-supervised learning from natural images classification to remote sensing images change detection arise from difference between the two tasks. The learned patch-level feature representations are not satisfying for the pixel-level precise change detection. In this paper, we proposed a novel pixel-level self-supervised hyperspectral spatial-spectral understanding network (HyperNet) to accomplish pixel-wise feature representation for effective hyperspectral change detection. Concretely, not patches but the whole images are fed into the network and the multi-temporal spatial-spectral features are compared pixel by pixel. Instead of processing the two-dimensional imaging space and spectral response dimension in hybrid style, a powerful spatial-spectral attention module is put forward to explore the spatial correlation and discriminative spectral features of multi-temporal hyperspectral images (HSIs), separately. Only the positive samples at the same location of bi-temporal HSIs are created and forced to be aligned, aiming at learning the spectral difference-invariant features. Moreover, a new similarity loss function named focal cosine is proposed to solve the problem of imbalanced easy and hard positive samples comparison, where the weights of those hard samples are enlarged and highlighted to promote the network training. Six hyperspectral datasets have been adopted to test the validity and generalization of proposed HyperNet. The extensive experiments demonstrate the superiority of HyperNet over the state-of-the-art algorithms on downstream hyperspectral change detection tasks.
Abstract:Hyperspectral anomalous change detection has been a challenging task for its emphasis on the dynamics of small and rare objects against the prevalent changes. In this paper, we have proposed a Multi-Temporal spatial-spectral Comparison Network for hyperspectral anomalous change detection (MTC-NET). The whole model is a deep siamese network, aiming at learning the prevalent spectral difference resulting from the complex imaging conditions from the hyperspectral images by contrastive learning. A three-dimensional spatial spectral attention module is designed to effectively extract the spatial semantic information and the key spectral differences. Then the gaps between the multi-temporal features are minimized, boosting the alignment of the semantic and spectral features and the suppression of the multi-temporal background spectral difference. The experiments on the "Viareggio 2013" datasets demonstrate the effectiveness of proposed MTC-NET.
Abstract:Characterized by tremendous spectral information, hyperspectral image is able to detect subtle changes and discriminate various change classes for change detection. The recent research works dominated by hyperspectral binary change detection, however, cannot provide fine change classes information. And most methods incorporating spectral unmixing for hyperspectral multiclass change detection (HMCD), yet suffer from the neglection of temporal correlation and error accumulation. In this study, we proposed an unsupervised Binary Change Guided hyperspectral multiclass change detection Network (BCG-Net) for HMCD, which aims at boosting the multiclass change detection result and unmixing result with the mature binary change detection approaches. In BCG-Net, a novel partial-siamese united-unmixing module is designed for multi-temporal spectral unmixing, and a groundbreaking temporal correlation constraint directed by the pseudo-labels of binary change detection result is developed to guide the unmixing process from the perspective of change detection, encouraging the abundance of the unchanged pixels more coherent and that of the changed pixels more accurate. Moreover, an innovative binary change detection rule is put forward to deal with the problem that traditional rule is susceptible to numerical values. The iterative optimization of the spectral unmixing process and the change detection process is proposed to eliminate the accumulated errors and bias from unmixing result to change detection result. The experimental results demonstrate that our proposed BCG-Net could achieve comparative or even outstanding performance of multiclass change detection among the state-of-the-art approaches and gain better spectral unmixing results at the same time.
Abstract:As the COVID-19 epidemic began to worsen in the first months of 2020, stringent lockdown policies were implemented in numerous cities throughout the world to control human transmission and mitigate its spread. Although transportation density reduction inside the city was felt subjectively, there has thus far been no objective and quantitative study of its variation to reflect the intracity population flows and their corresponding relationship with lockdown policy stringency from the view of remote sensing images with the high resolution under 1m. Accordingly, we here provide a quantitative investigation of the transportation density reduction before and after lockdown was implemented in six epicenter cities (Wuhan, Milan, Madrid, Paris, New York, and London) around the world during the COVID-19 epidemic, which is accomplished by extracting vehicles from the multi-temporal high-resolution remote sensing images. A novel vehicle detection model combining unsupervised vehicle candidate extraction and deep learning identification was specifically proposed for the images with the resolution of 0.5m. Our results indicate that transportation densities were reduced by an average of approximately 50% (and as much as 75.96%) in these six cities following lockdown. The influences on transportation density reduction rates are also highly correlated with policy stringency, with an R^2 value exceeding 0.83. Even within a specific city, the transportation density changes differed and tended to be distributed in accordance with the city's land-use patterns. Considering that public transportation was mostly reduced or even forbidden, our results indicate that city lockdown policies are effective at limiting human transmission within cities.