Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhanchuan Cai

ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Mar 30, 2025

Linfeng Tang, Yeda Wang, Zhanchuan Cai, Junjun Jiang, Jiayi Ma

Figure 1 for ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Figure 2 for ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Figure 3 for ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Figure 4 for ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Abstract:Current image fusion methods struggle to address the composite degradations encountered in real-world imaging scenarios and lack the flexibility to accommodate user-specific requirements. In response to these challenges, we propose a controllable image fusion framework with language-vision prompts, termed ControlFusion, which adaptively neutralizes composite degradations. On the one hand, we develop a degraded imaging model that integrates physical imaging mechanisms, including the Retinex theory and atmospheric scattering principle, to simulate composite degradations, thereby providing potential for addressing real-world complex degradations from the data level. On the other hand, we devise a prompt-modulated restoration and fusion network that dynamically enhances features with degradation prompts, enabling our method to accommodate composite degradation of varying levels. Specifically, considering individual variations in quality perception of users, we incorporate a text encoder to embed user-specified degradation types and severity levels as degradation prompts. We also design a spatial-frequency collaborative visual adapter that autonomously perceives degradations in source images, thus eliminating the complete dependence on user instructions. Extensive experiments demonstrate that ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling, particularly in countering real-world and compound degradations with various levels.

Via

Access Paper or Ask Questions

Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

May 23, 2024

Zhenyu Wei, Yujie He, Zhanchuan Cai

Figure 1 for Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Figure 2 for Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Figure 3 for Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Figure 4 for Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Abstract:RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth. Through these findings we highlight the potential of depth estimation in enhancing object tracking performance.

Via

Access Paper or Ask Questions