Abstract:Marine Saliency Segmentation (MSS) plays a pivotal role in various vision-based marine exploration tasks. However, existing marine segmentation techniques face the dilemma of object mislocalization and imprecise boundaries due to the complex underwater environment. Meanwhile, despite the impressive performance of diffusion models in visual segmentation, there remains potential to further leverage contextual semantics to enhance feature learning of region-level salient objects, thereby improving segmentation outcomes. Building on this insight, we propose DiffMSS, a novel marine saliency segmenter based on the diffusion model, which utilizes semantic knowledge distillation to guide the segmentation of marine salient objects. Specifically, we design a region-word similarity matching mechanism to identify salient terms at the word level from the text descriptions. These high-level semantic features guide the conditional feature learning network in generating salient and accurate diffusion conditions with semantic knowledge distillation. To further refine the segmentation of fine-grained structures in unique marine organisms, we develop the dedicated consensus deterministic sampling to suppress overconfident missegmentations. Comprehensive experiments demonstrate the superior performance of DiffMSS over state-of-the-art methods in both quantitative and qualitative evaluations.
Abstract:Underwater imaging often suffers from significant visual degradation, which limits its suitability for subsequent applications. While recent underwater image enhancement (UIE) methods rely on the current advances in deep neural network architecture designs, there is still considerable room for improvement in terms of cross-scene robustness and computational efficiency. Diffusion models have shown great success in image generation, prompting us to consider their application to UIE tasks. However, directly applying them to UIE tasks will pose two challenges, \textit{i.e.}, high computational budget and color unbalanced perturbations. To tackle these issues, we propose DiffColor, a distribution-aware diffusion and cross-spectral refinement model for efficient UIE. Instead of diffusing in the raw pixel space, we transfer the image into the wavelet domain to obtain such low-frequency and high-frequency spectra, it inherently reduces the image spatial dimensions by half after each transformation. Unlike single-noise image restoration tasks, underwater imaging exhibits unbalanced channel distributions due to the selective absorption of light by water. To address this, we design the Global Color Correction (GCC) module to handle the diverse color shifts, thereby avoiding potential global degradation disturbances during the denoising process. For the sacrificed image details caused by underwater scattering, we further present the Cross-Spectral Detail Refinement (CSDR) to enhance the high-frequency details, which are integrated with the low-frequency signal as input conditions for guiding the diffusion. This way not only ensures the high-fidelity of sampled content but also compensates for the sacrificed details. Comprehensive experiments demonstrate the superior performance of DiffColor over state-of-the-art methods in both quantitative and qualitative evaluations.