Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenhao Wu

DFR: A Decompose-Fuse-Reconstruct Framework for Multi-Modal Few-Shot Segmentation

Jul 22, 2025

Shuai Chen, Fanman Meng, Xiwei Zhang, Haoran Wei, Chenhao Wu, Qingbo Wu, Hongliang Li

Abstract:This paper presents DFR (Decompose, Fuse and Reconstruct), a novel framework that addresses the fundamental challenge of effectively utilizing multi-modal guidance in few-shot segmentation (FSS). While existing approaches primarily rely on visual support samples or textual descriptions, their single or dual-modal paradigms limit exploitation of rich perceptual information available in real-world scenarios. To overcome this limitation, the proposed approach leverages the Segment Anything Model (SAM) to systematically integrate visual, textual, and audio modalities for enhanced semantic understanding. The DFR framework introduces three key innovations: 1) Multi-modal Decompose: a hierarchical decomposition scheme that extracts visual region proposals via SAM, expands textual semantics into fine-grained descriptors, and processes audio features for contextual enrichment; 2) Multi-modal Contrastive Fuse: a fusion strategy employing contrastive learning to maintain consistency across visual, textual, and audio modalities while enabling dynamic semantic interactions between foreground and background features; 3) Dual-path Reconstruct: an adaptive integration mechanism combining semantic guidance from tri-modal fused tokens with geometric cues from multi-modal location priors. Extensive experiments across visual, textual, and audio modalities under both synthetic and real settings demonstrate DFR's substantial performance improvements over state-of-the-art methods.

* 3 figures

Via

Access Paper or Ask Questions

CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation

Jul 22, 2025

Shuai Chen, Fanman Meng, Chunjin Yang, Haoran Wei, Chenhao Wu, Qingbo Wu, Hongliang Li

Abstract:Cross-Domain Few-Shot Segmentation (CD-FSS) remains challenging due to limited data and domain shifts. Recent foundation models like the Segment Anything Model (SAM) have shown remarkable zero-shot generalization capability in general segmentation tasks, making it a promising solution for few-shot scenarios. However, adapting SAM to CD-FSS faces two critical challenges: reliance on manual prompt and limited cross-domain ability. Therefore, we propose the Composable Meta-Prompt (CMP) framework that introduces three key modules: (i) the Reference Complement and Transformation (RCT) module for semantic expansion, (ii) the Composable Meta-Prompt Generation (CMPG) module for automated meta-prompt synthesis, and (iii) the Frequency-Aware Interaction (FAI) module for domain discrepancy mitigation. Evaluations across four cross-domain datasets demonstrate CMP's state-of-the-art performance, achieving 71.8\% and 74.5\% mIoU in 1-shot and 5-shot scenarios respectively.

* 3 figures

Via

Access Paper or Ask Questions

CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Apr 07, 2025

Shuai Chen, Fanman Meng, Haoran Wei, Chenhao Wu, Qingbo Wu, Linfeng Xu, Hongliang Li

Figure 1 for CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Figure 2 for CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Figure 3 for CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Figure 4 for CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Abstract:Few-shot segmentation (FSS) aims to segment new classes using few annotated images. While recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM), they face two critical limitations: insufficient utilization of structural correlations in query images, and significant information loss when converting continuous position priors to discrete point prompts. To address these challenges, we propose CMaP-SAM, a novel framework that introduces contraction mapping theory to optimize position priors for SAM-driven few-shot segmentation. CMaP-SAM consists of three key components: (1) a contraction mapping module that formulates position prior optimization as a Banach contraction mapping with convergence guarantees. This module iteratively refines position priors through pixel-wise structural similarity, generating a converged prior that preserves both semantic guidance from reference images and structural correlations in query images; (2) an adaptive distribution alignment module bridging continuous priors with SAM's binary mask prompt encoder; and (3) a foreground-background decoupled refinement architecture producing accurate final segmentation masks. Extensive experiments demonstrate CMaP-SAM's effectiveness, achieving state-of-the-art performance with 71.1 mIoU on PASCAL-$5^i$ and 56.1 on COCO-$20^i$ datasets.

* 7 figures

Via

Access Paper or Ask Questions

No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

Jul 23, 2024

Shuai Chen, Fanman Meng, Chenhao Wu, Haoran Wei, Runtong Zhang, Qingbo Wu, Linfeng Xu, Hongliang Li

Figure 1 for No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

Figure 2 for No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

Figure 3 for No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

Figure 4 for No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

Abstract:Few-Shot Segmentation (FSS) aims to segment novel classes using only a few annotated images. Despite considerable process under pixel-wise support annotation, current FSS methods still face three issues: the inflexibility of backbone upgrade without re-training, the inability to uniformly handle various types of annotations (e.g., scribble, bounding box, mask and text), and the difficulty in accommodating different annotation quantity. To address these issues simultaneously, we propose DiffUp, a novel FSS method that conceptualizes the FSS task as a conditional generative problem using a diffusion process. For the first issue, we introduce a backbone-agnostic feature transformation module that converts different segmentation cues into unified coarse priors, facilitating seamless backbone upgrade without re-training. For the second issue, due to the varying granularity of transformed priors from diverse annotation types, we conceptualize these multi-granular transformed priors as analogous to noisy intermediates at different steps of a diffusion model. This is implemented via a self-conditioned modulation block coupled with a dual-level quality modulation branch. For the third issue, we incorporates an uncertainty-aware information fusion module that harmonizing the variability across zero-shot, one-shot and many-shot scenarios. Evaluated through rigorous benchmarks, DiffUp significantly outperforms existing FSS models in terms of flexibility and accuracy.

* 7 figures

Via

Access Paper or Ask Questions

R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

May 19, 2024

Huiying Yang, Zihan Jin, Chenhao Wu, Rujing Xiong, Robert Caiming Qiu, Zenan Ling

Abstract:Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors and the mobility of RISs. In this paper, we propose a novel modeling approach using Neural Radiance Fields (NeRF) to characterize the dynamics of electromagnetic fields in such environments. Our method utilizes NeRF-based ray tracing to intuitively capture and visualize the complex dynamics of signal propagation, effectively modeling the complete signal pathways from the transmitter to the RIS, and from the RIS to the receiver. This two-stage process accurately characterizes multiple complex transmission paths, enhancing our understanding of signal behavior in real-world scenarios. Our approach predicts the signal field for any specified RIS placement and receiver location, facilitating efficient RIS deployment. Experimental evaluations using both simulated and real-world data validate the significant benefits of our methodology.

Via

Access Paper or Ask Questions

On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

May 13, 2024

Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

Figure 1 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 2 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 3 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 4 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Abstract:Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. Adversarial samples in these studies are designed to attack only one dimension of either bitrate or distortion, targeting a submodel with a specific compression ratio. However, adversaries in real-world scenarios are neither confined to singular dimensional attacks nor always have control over compression ratios. This variability highlights the inadequacy of existing research in comprehensively assessing the adversarial robustness of LIC algorithms in practical applications. To tackle this issue, this paper presents two joint rate-distortion attack paradigms at both submodel and algorithm levels, i.e., Specific-ratio Rate-Distortion Attack (SRDA) and Agnostic-ratio Rate-Distortion Attack (ARDA). Additionally, a suite of multi-granularity assessment tools is introduced to evaluate the attack results from various perspectives. On this basis, extensive experiments on eight prominent LIC algorithms are conducted to offer a thorough analysis of their inherent vulnerabilities. Furthermore, we explore the efficacy of two defense techniques in improving the performance under joint rate-distortion attacks. The findings from these experiments can provide a valuable reference for the development of compression algorithms with enhanced adversarial robustness.

Via

Access Paper or Ask Questions

A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure

May 05, 2024

Chenhao Wu

Figure 1 for A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure

Figure 2 for A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure

Figure 3 for A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure

Figure 4 for A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure

Abstract:This work presents a drone detector with modified backbone and multiple pyramid feature maps enhancement structure (MDDPE). Novel feature maps improve modules that uses different levels of information to produce more robust and discriminatory features is proposed. These module includes the feature maps supplement function and the feature maps recombination enhancement function.To effectively handle the drone characteristics, auxiliary supervisions that are implemented in the early stages by employing tailored anchors designed are utilized. To further improve the modeling of real drone detection scenarios and initialization of the regressor, an updated anchor matching technique is introduced to match anchors and ground truth drone as closely as feasible. To show the proposed MDDPE's superiority over the most advanced detectors, extensive experiments are carried out using well-known drone detection benchmarks.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions

Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Sep 07, 2023

Jianyu Wen, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

Figure 1 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 2 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 3 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 4 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Abstract:In this paper, we propose a 2-stage low-light image enhancement method called Self-Reference Deep Adaptive Curve Estimation (Self-DACE). In the first stage, we present an intuitive, lightweight, fast, and unsupervised luminance enhancement algorithm. The algorithm is based on a novel low-light enhancement curve that can be used to locally boost image brightness. We also propose a new loss function with a simplified physical model designed to preserve natural images' color, structure, and fidelity. We use a vanilla CNN to map each pixel through deep Adaptive Adjustment Curves (AAC) while preserving the local image structure. Secondly, we introduce the corresponding denoising scheme to remove the latent noise in the darkness. We approximately model the noise in the dark and deploy a Denoising-Net to estimate and remove the noise after the first stage. Exhaustive qualitative and quantitative analysis shows that our method outperforms existing state-of-the-art algorithms on multiple real-world datasets.

Via

Access Paper or Ask Questions