Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Sep 15, 2024

Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang

Figure 1 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Figure 2 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Figure 3 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Figure 4 for Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Share this with someone who'll enjoy it:

Abstract:Direct Preference Optimization (DPO) has recently expanded its successful application from aligning large language models (LLMs) to aligning text-to-image models with human preferences, which has generated considerable interest within the community. However, we have observed that these approaches rely solely on minimizing the reverse Kullback-Leibler divergence during alignment process between the fine-tuned model and the reference model, neglecting the incorporation of other divergence constraints. In this study, we focus on extending reverse Kullback-Leibler divergence in the alignment paradigm of text-to-image models to $f$-divergence, which aims to garner better alignment performance as well as good generation diversity. We provide the generalized formula of the alignment paradigm under the $f$-divergence condition and thoroughly analyze the impact of different divergence constraints on alignment process from the perspective of gradient fields. We conduct comprehensive evaluation on image-text alignment performance, human value alignment performance and generation diversity performance under different divergence constraints, and the results indicate that alignment based on Jensen-Shannon divergence achieves the best trade-off among them. The option of divergence employed for aligning text-to-image models significantly impacts the trade-off between alignment performance (especially human value alignment) and generation diversity, which highlights the necessity of selecting an appropriate divergence for practical applications.

* 32 pages

View paper on

Share this with someone who'll enjoy it:

Title:Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Paper and Code