Abstract:With the rapid development of marine engineering projects such as marine resource extraction and oceanic surveys, underwater visual imaging and analysis has become a critical technology. Unfortunately, due to the inevitable non-linear attenuation of light in underwater environments, underwater images and videos often suffer from low contrast, blurriness, and color degradation, which significantly complicate the subsequent research. Existing underwater image enhancement methods often treat the haze and color cast as a unified degradation process and disregard their independence and interdependence, which limits the performance improvement. Here, we propose a Vision Transformer (ViT)-based network (referred to as WaterFormer) to improve the underwater image quality. WaterFormer contains three major components: a dehazing block (DehazeFormer Block) to capture the self-correlated haze features and extract deep-level features, a Color Restoration Block (CRB) to capture self-correlated color cast features, and a Channel Fusion Block (CFB) to capture fusion features within the network. To ensure authenticity, a soft reconstruction layer based on the underwater imaging physics model is included. To improve the quality of the enhanced images, we introduce the Chromatic Consistency Loss and Sobel Color Loss to train the network. Comprehensive experimental results demonstrate that WaterFormer outperforms other state-of-the-art methods in enhancing underwater images.