Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yimin Zhu

Sparse Deformable Mamba for Hyperspectral Image Classification

Apr 15, 2025

Lincoln Linlin Xu, Yimin Zhu, Zack Dewis, Zhengsen Xu, Motasem Alkayid, Mabel Heffring, Saeid Taleghanidoozdoozan

Abstract:Although Mamba models significantly improve hyperspectral image (HSI) classification, one critical challenge is the difficulty in building the sequence of Mamba tokens efficiently. This paper presents a Sparse Deformable Mamba (SDMamba) approach for enhanced HSI classification, with the following contributions. First, to enhance Mamba sequence, an efficient Sparse Deformable Sequencing (SDS) approach is designed to adaptively learn the ''optimal" sequence, leading to sparse and deformable Mamba sequence with increased detail preservation and decreased computations. Second, to boost spatial-spectral feature learning, based on SDS, a Sparse Deformable Spatial Mamba Module (SDSpaM) and a Sparse Deformable Spectral Mamba Module (SDSpeM) are designed for tailored modeling of the spatial information spectral information. Last, to improve the fusion of SDSpaM and SDSpeM, an attention based feature fusion approach is designed to integrate the outputs of the SDSpaM and SDSpeM. The proposed method is tested on several benchmark datasets with many state-of-the-art approaches, demonstrating that the proposed approach can achieve higher accuracy with less computation, and better detail small-class preservation capability.

Via

Access Paper or Ask Questions

Language-Informed Hyperspectral Image Synthesis for Imbalanced-Small Sample Classification via Semi-Supervised Conditional Diffusion Model

Feb 28, 2025

Yimin Zhu, Linlin Xu

Abstract:Data augmentation effectively addresses the imbalanced-small sample data (ISSD) problem in hyperspectral image classification (HSIC). While most methodologies extend features in the latent space, few leverage text-driven generation to create realistic and diverse samples. Recently, text-guided diffusion models have gained significant attention due to their ability to generate highly diverse and high-quality images based on text prompts in natural image synthesis. Motivated by this, this paper proposes Txt2HSI-LDM(VAE), a novel language-informed hyperspectral image synthesis method to address the ISSD in HSIC. The proposed approach uses a denoising diffusion model, which iteratively removes Gaussian noise to generate hyperspectral samples conditioned on textual descriptions. First, to address the high-dimensionality of hyperspectral data, a universal variational autoencoder (VAE) is designed to map the data into a low-dimensional latent space, which provides stable features and reduces the inference complexity of diffusion model. Second, a semi-supervised diffusion model is designed to fully take advantage of unlabeled data. Random polygon spatial clipping (RPSC) and uncertainty estimation of latent feature (LF-UE) are used to simulate the varying degrees of mixing. Third, the VAE decodes HSI from latent space generated by the diffusion model with the language conditions as input. In our experiments, we fully evaluate synthetic samples' effectiveness from statistical characteristics and data distribution in 2D-PCA space. Additionally, visual-linguistic cross-attention is visualized on the pixel level to prove that our proposed model can capture the spatial layout and geometry of the generated data. Experiments demonstrate that the performance of the proposed Txt2HSI-LDM(VAE) surpasses the classical backbone models, state-of-the-art CNNs, and semi-supervised methods.

Via

Access Paper or Ask Questions

Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Feb 27, 2025

Yimin Zhu, Linlin Xu

Figure 1 for Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Figure 2 for Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Figure 3 for Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Figure 4 for Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Abstract:Although efficient extraction of discriminative spatial-spectral features is critical for hyperspectral images classification (HSIC), it is difficult to achieve these features due to factors such as the spatial-spectral heterogeneity and noise effect. This paper presents a Spatial-Spectral Diffusion Contrastive Representation Network (DiffCRN), based on denoising diffusion probabilistic model (DDPM) combined with contrastive learning (CL) for HSIC, with the following characteristics. First,to improve spatial-spectral feature representation, instead of adopting the UNets-like structure which is widely used for DDPM, we design a novel staged architecture with spatial self-attention denoising module (SSAD) and spectral group self-attention denoising module (SGSAD) in DiffCRN with improved efficiency for spectral-spatial feature learning. Second, to improve unsupervised feature learning efficiency, we design new DDPM model with logarithmic absolute error (LAE) loss and CL that improve the loss function effectiveness and increase the instance-level and inter-class discriminability. Third, to improve feature selection, we design a learnable approach based on pixel-level spectral angle mapping (SAM) for the selection of time steps in the proposed DDPM model in an adaptive and automatic manner. Last, to improve feature integration and classification, we design an Adaptive weighted addition modul (AWAM) and Cross time step Spectral-Spatial Fusion Module (CTSSFM) to fuse time-step-wise features and perform classification. Experiments conducted on widely used four HSI datasets demonstrate the improved performance of the proposed DiffCRN over the classical backbone models and state-of-the-art GAN, transformer models and other pretrained methods. The source code and pre-trained model will be made available publicly.

Via

Access Paper or Ask Questions

Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Nov 21, 2024

Zeyu Dong, Yimin Zhu, Yansong Li, Kevin Mahon, Yu Sun

Figure 1 for Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Figure 2 for Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Figure 3 for Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Figure 4 for Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Abstract:Traditional autonomous driving methods adopt a modular design, decomposing tasks into sub-tasks. In contrast, end-to-end autonomous driving directly outputs actions from raw sensor data, avoiding error accumulation. However, training an end-to-end model requires a comprehensive dataset; otherwise, the model exhibits poor generalization capabilities. Recently, large language models (LLMs) have been applied to enhance the generalization capabilities of end-to-end driving models. Most studies explore LLMs in an open-loop manner, where the output actions are compared to those of experts without direct feedback from the real world, while others examine closed-loop results only in simulations. This paper proposes an efficient architecture that integrates multimodal LLMs into end-to-end driving models operating in closed-loop settings in real-world environments. In our architecture, the LLM periodically processes raw sensor data to generate high-level driving instructions, effectively guiding the end-to-end model, even at a slower rate than the raw sensor data. This architecture relaxes the trade-off between the latency and inference quality of the LLM. It also allows us to choose from a wide variety of LLMs to improve high-level driving instructions and minimize fine-tuning costs. Consequently, our architecture reduces data collection requirements because the LLMs do not directly output actions; we only need to train a simple imitation learning model to output actions. In our experiments, the training data for the end-to-end model in a real-world environment consists of only simple obstacle configurations with one traffic cone, while the test environment is more complex and contains multiple obstacles placed in various positions. Experiments show that the proposed architecture enhances the generalization capabilities of the end-to-end model even without fine-tuning the LLM.

Via

Access Paper or Ask Questions

TactV: A Class of Hybrid Terrestrial/Aerial Coaxial Tilt-Rotor Vehicles

Nov 19, 2024

Yifei Dong, Yimin Zhu, Lixian Zhang, Yihang Ding

Abstract:To enhance the obstacle-crossing and endurance capabilities of vehicles operating in complex environments, this paper presents the design of a hybrid terrestrial/aerial coaxial tilt-rotor vehicle, TactV, which integrates advantages such as lightweight construction and high maneuverability. Unlike existing tandem dual-rotor vehicles, TactV employs a tiltable coaxial dual-rotor design and features a spherical cage structure that encases the body, allowing for omnidirectional movement while further reducing its overall dimensions. To enable TactV to maneuver flexibly in aerial, planar, and inclined surfaces, we established corresponding dynamic and control models for each mode. Additionally, we leveraged TactV's tiltable center of gravity to design energy-saving and high-mobility modes for ground operations, thereby further enhancing its endurance. Experimental designs for both aerial and ground tests corroborated the superiority of TactV's movement capabilities and control strategies.

Via

Access Paper or Ask Questions

Context-Aware Design of Cyber-Physical Human Systems

Jan 07, 2020

Supratik Mukhopadhyay, Qun Liu, Edward Collier, Yimin Zhu, Ravindra Gudishala, Chanachok Chokwitthaya, Robert DiBiano, Alimire Nabijiang, Sanaz Saeidi, Subhajit Sidhanta(+1 more)

Figure 1 for Context-Aware Design of Cyber-Physical Human Systems

Figure 2 for Context-Aware Design of Cyber-Physical Human Systems

Figure 3 for Context-Aware Design of Cyber-Physical Human Systems

Abstract:Recently, it has been widely accepted by the research community that interactions between humans and cyber-physical infrastructures have played a significant role in determining the performance of the latter. The existing paradigm for designing cyber-physical systems for optimal performance focuses on developing models based on historical data. The impacts of context factors driving human system interaction are challenging and are difficult to capture and replicate in existing design models. As a result, many existing models do not or only partially address those context factors of a new design owing to the lack of capabilities to capture the context factors. This limitation in many existing models often causes performance gaps between predicted and measured results. We envision a new design environment, a cyber-physical human system (CPHS) where decision-making processes for physical infrastructures under design are intelligently connected to distributed resources over cyberinfrastructure such as experiments on design features and empirical evidence from operations of existing instances. The framework combines existing design models with context-aware design-specific data involving human-infrastructure interactions in new designs, using a machine learning approach to create augmented design models with improved predictive powers.

* Paper was accepted at the 12th International Conference on Communication Systems and Networks (COMSNETS 2020)

Via

Access Paper or Ask Questions

Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Jun 14, 2019

Chanachok Chokwitthaya, Edward Collier, Yimin Zhu, Supratik Mukhopadhyay

Figure 1 for Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Figure 2 for Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Figure 3 for Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Figure 4 for Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Abstract:Building performance discrepancies between building design and operation are one of the causes that lead many new designs fail to achieve their goals and objectives. One of main factors contributing to the discrepancy is occupant behaviors. Occupants responding to a new design are influenced by several factors. Existing building performance models (BPMs) ignore or partially address those factors (called contextual factors) while developing BPMs. To potentially reduce the discrepancies and improve the prediction accuracy of BPMs, this paper proposes a computational framework for learning mixture models by using Generative Adversarial Networks (GANs) that appropriately combining existing BPMs with knowledge on occupant behaviors to contextual factors in new designs. Immersive virtual environments (IVEs) experiments are used to acquire data on such behaviors. Performance targets are used to guide appropriate combination of existing BPMs with knowledge on occupant behaviors. The resulting model obtained is called an augmented BPM. Two different experiments related to occupant lighting behaviors are shown as case study. The results reveal that augmented BPMs significantly outperformed existing BPMs with respect to achieving specified performance targets. The case study confirms the potential of the computational framework for improving prediction accuracy of BPMs during design.

* 9 pages, 4 figures, The 2019 International Joint Conference on Neural Networks (IJCNN)

Via

Access Paper or Ask Questions

Improving Route Choice Models by Incorporating Contextual Factors via Knowledge Distillation

Mar 27, 2019

Qun Liu, Supratik Mukhopadhyay, Yimin Zhu, Ravindra Gudishala, Sanaz Saeidi, Alimire Nabijiang

Figure 1 for Improving Route Choice Models by Incorporating Contextual Factors via Knowledge Distillation

Figure 2 for Improving Route Choice Models by Incorporating Contextual Factors via Knowledge Distillation

Figure 3 for Improving Route Choice Models by Incorporating Contextual Factors via Knowledge Distillation

Figure 4 for Improving Route Choice Models by Incorporating Contextual Factors via Knowledge Distillation

Abstract:Route Choice Models predict the route choices of travelers traversing an urban area. Most of the route choice models link route characteristics of alternative routes to those chosen by the drivers. The models play an important role in prediction of traffic levels on different routes and thus assist in development of efficient traffic management strategies that result in minimizing traffic delay and maximizing effective utilization of transport system. High fidelity route choice models are required to predict traffic levels with higher accuracy. Existing route choice models do not take into account dynamic contextual conditions such as the occurrence of an accident, the socio-cultural and economic background of drivers, other human behaviors, the dynamic personal risk level, etc. As a result, they can only make predictions at an aggregate level and for a fixed set of contextual factors. For higher fidelity, it is highly desirable to use a model that captures significance of subjective or contextual factors in route choice. This paper presents a novel approach for developing high-fidelity route choice models with increased predictive power by augmenting existing aggregate level baseline models with information on drivers' responses to contextual factors obtained from Stated Choice Experiments carried out in an Immersive Virtual Environment through the use of knowledge distillation.

* Paper was accepted at the 2019 International Joint Conference on Neural Networks (IJCNN 2019)

Via

Access Paper or Ask Questions