Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shun Lu

MASTER: Multimodal Segmentation with Text Prompts

Mar 06, 2025

Fuyang Liu, Shun Lu, Jilin Mei, Yu Hu

Abstract:RGB-Thermal fusion is a potential solution for various weather and light conditions in challenging scenarios. However, plenty of studies focus on designing complex modules to fuse different modalities. With the widespread application of large language models (LLMs), valuable information can be more effectively extracted from natural language. Therefore, we aim to leverage the advantages of large language models to design a structurally simple and highly adaptable multimodal fusion model architecture. We proposed MultimodAl Segmentation with TExt PRompts (MASTER) architecture, which integrates LLM into the fusion of RGB-Thermal multimodal data and allows complex query text to participate in the fusion process. Our model utilizes a dual-path structure to extract information from different modalities of images. Additionally, we employ LLM as the core module for multimodal fusion, enabling the model to generate learnable codebook tokens from RGB, thermal images, and textual information. A lightweight image decoder is used to obtain semantic segmentation results. The proposed MASTER performs exceptionally well in benchmark tests across various automated driving scenarios, yielding promising results.

Via

Access Paper or Ask Questions

PID: Physics-Informed Diffusion Model for Infrared Image Generation

Jul 12, 2024

Fangyuan Mao, Jilin Mei, Shun Lu, Fuyang Liu, Liang Chen, Fangzhou Zhao, Yu Hu

Figure 1 for PID: Physics-Informed Diffusion Model for Infrared Image Generation

Figure 2 for PID: Physics-Informed Diffusion Model for Infrared Image Generation

Figure 3 for PID: Physics-Informed Diffusion Model for Infrared Image Generation

Figure 4 for PID: Physics-Informed Diffusion Model for Infrared Image Generation

Abstract:Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these issues, we propose a Physics-Informed Diffusion (PID) model for translating RGB images to infrared images that adhere to physical laws. Our method leverages the iterative optimization of the diffusion model and incorporates strong physical constraints based on prior knowledge of infrared laws during training. This approach enhances the similarity between translated infrared images and the real infrared domain without increasing extra training parameters. Experimental results demonstrate that PID significantly outperforms existing state-of-the-art methods. Our code is available at https://github.com/fangyuanmao/PID.

Via

Access Paper or Ask Questions

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Feb 28, 2023

Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan, Chengru Song

Figure 1 for PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Figure 2 for PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Figure 3 for PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Figure 4 for PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Abstract:Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost. However, several works have pointed out that the shared weights suffer from different gradient descent directions during training. And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA&DA). We theoretically derive the relationship between the gradient variance and the sampling distributions, and reveal that the optimal sampling probability is proportional to the normalized gradient norm of path and training data. Hence, we use the normalized gradient norm as the importance indicator for path and training data, and adopt an importance sampling strategy for the supernet training. Our method only requires negligible computation cost for optimizing the sampling distributions of path and data, but achieves lower gradient variance during supernet training and better generalization performance for the supernet, resulting in a more consistent NAS. We conduct comprehensive comparisons with other improved approaches in various search spaces. Results show that our method surpasses others with more reliable ranking performance and higher accuracy of searched architectures, showing the effectiveness of our method. Code is available at https://github.com/ShunLu91/PA-DA.

* To appear in CVPR 2023; we will update the camera-ready version soon

Via

Access Paper or Ask Questions

SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Sep 18, 2021

Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen Yang, Ji Liu

Figure 1 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Figure 2 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Figure 3 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Figure 4 for SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Abstract:Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin. Code and trained weights are in https://github.com/wentaozhu/speechnas.git

* 8 pages, 3 figures, 3 tables. Accepted by ASRU2021

Via

Access Paper or Ask Questions

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Sep 02, 2020

Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan

Figure 1 for DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Figure 2 for DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Figure 3 for DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Figure 4 for DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Abstract:Despite the fast development of differentiable architecture search (DARTS), it suffers from a standing instability issue regarding searching performance, which extremely limits its application. Existing robustifying methods draw clues from the outcome instead of finding out the causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal of performance collapse, and the searching should be stopped once an indicator reaches a preset threshold. However, these methods tend to easily reject good architectures if thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections with a learnable architectural coefficient can easily recover from a disadvantageous state and become dominant. We conjecture that skip connections profit too much from this privilege, hence causing the collapse for the derived model. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. Extensive experiments on various datasets verify that our approach can substantially improve the robustness of DARTS.

* A very robust, efficient and indicator-free approach to address the collapse of differentiable architecture search. Point out some issues of indicator based methods

Via

Access Paper or Ask Questions