Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhengbo Xu

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Jul 03, 2024

Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu

Figure 1 for Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Figure 2 for Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Figure 3 for Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Figure 4 for Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Abstract:Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework that contributes a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which filters the latent features of the source image in the DCT domain, yielding filtered image features bearing different DCT spectral bands as different control signals to the pre-trained Latent Diffusion Model. We reveal that control signals of different DCT spectral bands bridge the source image and the T2I generated image in different correlations (e.g., style, structure, layout, contour, etc.), and thus enable versatile I2I applications emphasizing different I2I correlations, including style-guided content creation, image semantic manipulation, image scene translation, and image style translation. Different from related approaches, FCDiffusion establishes a unified text-guided I2I framework suitable for diverse image translation tasks simply by switching among different frequency control branches at inference time. The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively. The code is publicly available at: https://github.com/XiangGao1102/FCDiffusion.

* Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(3), 1824-1832
* Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI 2024)

Via

Access Paper or Ask Questions

Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation

Oct 07, 2022

Wenjing Wang, Zhengbo Xu, Haofeng Huang, Jiaying Liu

Figure 1 for Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation

Figure 2 for Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation

Figure 3 for Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation

Figure 4 for Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation

Abstract:Low light conditions not only degrade human visual experience, but also reduce the performance of downstream machine analytics. Although many works have been designed for low-light enhancement or domain adaptive machine analytics, the former considers less on high-level vision, while the latter neglects the potential of image-level signal adjustment. How to restore underexposed images/videos from the perspective of machine vision has long been overlooked. In this paper, we are the first to propose a learnable illumination enhancement model for high-level vision. Inspired by real camera response functions, we assume that the illumination enhancement function should be a concave curve, and propose to satisfy this concavity through discrete integral. With the intention of adapting illumination from the perspective of machine vision without task-specific annotated data, we design an asymmetric cross-domain self-supervised training strategy. Our model architecture and training designs mutually benefit each other, forming a powerful unsupervised normal-to-low light adaptation framework. Comprehensive experiments demonstrate that our method surpasses existing low-light enhancement and adaptation methods and shows superior generalization on various low-light vision tasks, including classification, detection, action recognition, and optical flow estimation. Project website: https://daooshee.github.io/SACC-Website/

* This paper has been accepted by ACM Multimedia 2022

Via

Access Paper or Ask Questions