Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuyang Li

FlexiMo: A Flexible Remote Sensing Foundation Model

Mar 31, 2025

Xuyang Li, Chenyu Li, Pedram Ghamisi, Danfeng Hong

Abstract:The rapid expansion of multi-source satellite imagery drives innovation in Earth observation, opening unprecedented opportunities for Remote Sensing Foundation Models to harness diverse data. However, many existing models remain constrained by fixed spatial resolutions and patch sizes, limiting their ability to fully exploit the heterogeneous spatial characteristics inherent in satellite imagery. To address these challenges, we propose FlexiMo, a flexible remote sensing foundation model that endows the pre-trained model with the flexibility to adapt to arbitrary spatial resolutions. Central to FlexiMo is a spatial resolution-aware module that employs a parameter-free alignment embedding mechanism to dynamically recalibrate patch embeddings based on the input image's resolution and dimensions. This design not only preserves critical token characteristics and ensures multi-scale feature fidelity but also enables efficient feature extraction without requiring modifications to the underlying network architecture. In addition, FlexiMo incorporates a lightweight channel adaptation module that leverages prior spectral information from sensors. This mechanism allows the model to process images with varying numbers of channels while maintaining the data's intrinsic physical properties. Extensive experiments on diverse multimodal, multi-resolution, and multi-scale datasets demonstrate that FlexiMo significantly enhances model generalization and robustness. In particular, our method achieves outstanding performance across a range of downstream tasks, including scene classification, land cover classification, urban building segmentation, and cloud detection. By enabling parameter-efficient and physically consistent adaptation, FlexiMo paves the way for more adaptable and effective foundation models in real-world remote sensing applications.

Via

Access Paper or Ask Questions

SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning

Feb 21, 2025

Xuyang Li, Romit Maulik

Abstract:Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems--especially those requiring precise and reliable performance--often demand formal stability, and existing DRL approaches typically lack explicit mechanisms to ensure or analyze stability. To address this limitation, we propose SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, our approach enables both stability analysis and interpretability. We demonstrated that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.

Via

Access Paper or Ask Questions

SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model

Dec 26, 2024

Xuyang Li, Danfeng Hong, Chenyu Li, Jocelyn Chanussot

Abstract:Remote Sensing (RS) data contains a wealth of multi-dimensional information crucial for Earth observation. Owing to its vast volume, diverse sources, and temporal properties, RS data is highly suitable for the development of large Visual Foundation Models (VFMs). VFMs act as robust feature extractors, learning from extensive RS data, and are subsequently fine-tuned for deployment in various geoscientific tasks. However, current VFMs in the RS domain are predominantly pretrained and tailored exclusively for specific characteristics of RS imagery, neglecting the potential of utilizing the multi-dimensional properties of RS data. Therefore, in this work, we propose SeaMo, a pioneering visual foundation model that integrates multi-seasonal and multimodal information in the RS field. SeaMo is designed to harness multiple properties of RS data. Within the masked image modeling framework, we employ non-aligned cropping techniques to extract spatial properties, use multi-source inputs for multimodal integration, and incorporate temporal-multimodal fusion blocks for effective assimilation of multi-seasonal data. SeaMo explicitly models the multi-dimensional properties of RS data, making the model more comprehensive, robust, and versatile. We applied SeaMo to several downstream geoscience tasks, which demonstrated exceptional performance. Extensive ablation studies were conducted to validate the model's superiority.

Via

Access Paper or Ask Questions

Mechanics-Informed Autoencoder Enables Automated Detection and Localization of Unforeseen Structural Damage

Feb 23, 2024

Xuyang Li, Hamed Bolandi, Mahdi Masmoudi, Talal Salem, Nizar Lajnef, Vishnu Naresh Boddeti

Abstract:Structural health monitoring (SHM) is vital for ensuring the safety and longevity of structures like buildings and bridges. As the volume and scale of structures and the impact of their failure continue to grow, there is a dire need for SHM techniques that are scalable, inexpensive, operate passively without human intervention, and customized for each mechanical structure without the need for complex baseline models. We present a novel "deploy-and-forget" approach for automated detection and localization of damages in structures. It is based on a synergistic combination of fully passive measurements from inexpensive sensors and a mechanics-informed autoencoder. Once deployed, our solution continuously learns and adapts a bespoke baseline model for each structure, learning from its undamaged state's response characteristics. After learning from just 3 hours of data, it can autonomously detect and localize different types of unforeseen damage. Results from numerical simulations and experiments indicate that incorporating the mechanical characteristics into the variational autoencoder allows for up to 35\% earlier detection and localization of damage over a standard autoencoder. Our approach holds substantial promise for a significant reduction in human intervention and inspection costs and enables proactive and preventive maintenance strategies, thus extending the lifespan, reliability, and sustainability of civil infrastructures.

Via

Access Paper or Ask Questions

SpectralGPT: Spectral Foundation Model

Nov 25, 2023

Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia(+4 more)

Figure 1 for SpectralGPT: Spectral Foundation Model

Figure 2 for SpectralGPT: Spectral Foundation Model

Figure 3 for SpectralGPT: Spectral Foundation Model

Figure 4 for SpectralGPT: Spectral Foundation Model

Abstract:The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.

Via

Access Paper or Ask Questions

UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment

Apr 07, 2023

Xuyang Li, Jianwu Fang, Kai Du, Kuizhi Mei, Jianru Xue

Abstract:This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combined with human-in-the-loop, which allows the UAV to avoid obstacles automatically during flying. We design multiple reward functions based on the relevant domain knowledge to guide UAV navigation. The role of human-in-the-loop is to dynamically change the reward function of the UAV in different situations to suit the obstacle avoidance of the UAV better. We verify the success rate and average step size on urban, rural, and forest scenarios, and the experimental results show that the proposed method can reduce the training convergence time and improve the efficiency and accuracy of navigation tasks. The code is available on the website https://github.com/Monnalo/UAV_navigation.

* accepted in CCC2023

Via

Access Paper or Ask Questions

Neuro-DynaStress: Predicting Dynamic Stress Distributions in Structural Components

Dec 19, 2022

Hamed Bolandi, Gautam Sreekumar, Xuyang Li, Nizar Lajnef, Vishnu Naresh Boddeti

Abstract:Structural components are typically exposed to dynamic loading, such as earthquakes, wind, and explosions. Structural engineers should be able to conduct real-time analysis in the aftermath or during extreme disaster events requiring immediate corrections to avoid fatal failures. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real-time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity and are computationally prohibitive. Therefore, to reduce computational cost while preserving accuracy, a deep learning model, Neuro-DynaStress, is proposed to predict the entire sequence of stress distribution based on finite element simulations using a partial differential equation (PDE) solver. The model was designed and trained to use the geometry, boundary conditions and sequence of loads as input and predict the sequences of high-resolution stress contours. The performance of the proposed framework is compared to finite element simulations using a PDE solver.

* 16 pages, 12 figures. arXiv admin note: text overlap with arXiv:2211.16190

Via

Access Paper or Ask Questions

Physics Informed Neural Network for Dynamic Stress Prediction

Nov 28, 2022

Hamed Bolandi, Gautam Sreekumar, Xuyang Li, Nizar Lajnef, Vishnu Naresh Boddeti

Abstract:Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.

* 14 pages, 13 figures

Via

Access Paper or Ask Questions

NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Aug 26, 2022

Xuyang Li, Hamed Bolandi, Talal Salem, Nizar Lajnef, Vishnu Naresh Boddeti

Figure 1 for NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Figure 2 for NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Figure 3 for NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Figure 4 for NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Abstract:Structural monitoring for complex built environments often suffers from mismatch between design, laboratory testing, and actual built parameters. Additionally, real-world structural identification problems encounter many challenges. For example, the lack of accurate baseline models, high dimensionality, and complex multivariate partial differential equations (PDEs) pose significant difficulties in training and learning conventional data-driven algorithms. This paper explores a new framework, dubbed NeuralSI, for structural identification by augmenting PDEs that govern structural dynamics with neural networks. Our approach seeks to estimate nonlinear parameters from governing equations. We consider the vibration of nonlinear beams with two unknown parameters, one that represents geometric and material variations, and another that captures energy losses in the system mainly through damping. The data for parameter estimation is obtained from a limited set of measurements, which is conducive to applications in structural health monitoring where the exact state of an existing structure is typically unknown and only a limited amount of data samples can be collected in the field. The trained model can also be extrapolated under both standard and extreme conditions using the identified structural parameters. We compare with pure data-driven Neural Networks and other classical Physics-Informed Neural Networks (PINNs). Our approach reduces both interpolation and extrapolation errors in displacement distribution by two to five orders of magnitude over the baselines. Code is available at https://github.com/human-analysis/neural-structural-identification

* ECCV 2022 Workshop on Computer Vision for Civil and Infrastructure Engineering

Via

Access Paper or Ask Questions

Embedding Semantic Hierarchy in Discrete Optimal Transport for Risk Minimization

Apr 30, 2021

Yubin Ge, Site Li, Xuyang Li, Fangfang Fan, Wanqing Xie, Jane You, Xiaofeng Liu

Figure 1 for Embedding Semantic Hierarchy in Discrete Optimal Transport for Risk Minimization

Figure 2 for Embedding Semantic Hierarchy in Discrete Optimal Transport for Risk Minimization

Figure 3 for Embedding Semantic Hierarchy in Discrete Optimal Transport for Risk Minimization

Figure 4 for Embedding Semantic Hierarchy in Discrete Optimal Transport for Risk Minimization

Abstract:The widely-used cross-entropy (CE) loss-based deep networks achieved significant progress w.r.t. the classification accuracy. However, the CE loss can essentially ignore the risk of misclassification which is usually measured by the distance between the prediction and label in a semantic hierarchical tree. In this paper, we propose to incorporate the risk-aware inter-class correlation in a discrete optimal transport (DOT) training framework by configuring its ground distance matrix. The ground distance matrix can be pre-defined following a priori of hierarchical semantic risk. Specifically, we define the tree induced error (TIE) on a hierarchical semantic tree and extend it to its increasing function from the optimization perspective. The semantic similarity in each level of a tree is integrated with the information gain. We achieve promising results on several large scale image classification tasks with a semantic tree structure in a plug and play manner.

* Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021

Via

Access Paper or Ask Questions