In this paper, we propose a novel underwater image enhancement method, by utilizing the multi-guided diffusion model for iterative enhancement. Unlike other image enhancement tasks, underwater images suffer from the unavailability of real reference images. Although existing works exploit synthetic images, manually selected well-enhanced images as reference images, to train enhancement networks, their enhancement performance always comes with subjective preferences that are inherited from the manual selection. To address this issue, we also use the image synthesis strategy, but the synthetic images derive from in-air natural images degraded into corresponding underwater images, guided by the underwater domain. Based on this strategy, the diffusion model can learn the prior knowledge of image enhancement from the underwater degradation domain to the real in-air natural domain. However, it is inevitable to fine-tune the model to suit downstream tasks, and this may erase the prior knowledge. To mitigate this, we combine the prior knowledge from the in-air natural domain with Contrastive Language-Image Pretraining (CLIP) to train a classifier for controlling the diffusion model generation process. Moreover, for image enhancement tasks, we find that the image-to-image diffusion model and the CLIP-Classifier mainly act in the high-frequency region during the fine-tuning process. Therefore, we propose a fast fine-tuning strategy focusing on the high-frequency region, which can be up to 10 times faster than the traditional strategy. Extensive experiments demonstrate that our method, abbreviated as CLIP-UIE, exhibit a more natural appearance.