Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anastasia Yaschenko

Samsung AI Center, Higher School of Economics

DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

May 27, 2025

Shamil Ayupov, Maksim Nakhodnov, Anastasia Yaschenko, Andrey Kuznetsov, Aibek Alanov

Abstract:Personalized diffusion models have shown remarkable success in Text-to-Image (T2I) generation by enabling the injection of user-defined concepts into diverse contexts. However, balancing concept fidelity with contextual alignment remains a challenging open problem. In this work, we propose an RL-based approach that leverages the diverse outputs of T2I models to address this issue. Our method eliminates the need for human-annotated scores by generating a synthetic paired dataset for DPO-like training using external quality metrics. These better-worse pairs are specifically constructed to improve both concept fidelity and prompt adherence. Moreover, our approach supports flexible adjustment of the trade-off between image fidelity and textual alignment. Through multi-step training, our approach outperforms a naive baseline in convergence speed and output quality. We conduct extensive qualitative and quantitative analysis, demonstrating the effectiveness of our method across various architectures and fine-tuning techniques. The source code can be found at https://github.com/ControlGenAI/DreamBoothDPO.

* The first two authors contributed equally. The source code can be found at https://github.com/ControlGenAI/DreamBoothDPO

Via

Access Paper or Ask Questions

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Jun 18, 2024

Polina Karpikova, Andrei Spiridonov, Anna Vorontsova, Anastasia Yaschenko, Ekaterina Radionova, Igor Medvedev, Alexander Limonov

Figure 1 for SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Figure 2 for SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Figure 3 for SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Figure 4 for SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Abstract:Self-portraits captured from a short distance might look unnatural or even unattractive due to heavy distortions making facial features malformed, and ill-placed head poses. In this paper, we propose SUPER, a novel method of eliminating distortions and adjusting head pose in a close-up face crop. We perform 3D GAN inversion for a facial image by optimizing camera parameters and face latent code, which gives a generated image. Besides, we estimate depth from the obtained latent code, create a depth-induced 3D mesh, and render it with updated camera parameters to obtain a warped portrait. Finally, we apply the visibility-based blending so that visible regions are reprojected, and occluded parts are restored with a generative model. Experiments on face undistortion benchmarks and on our self-collected Head Rotation dataset (HeRo), show that SUPER outperforms previous approaches both qualitatively and quantitatively, opening new possibilities for photorealistic selfie editing.

Via

Access Paper or Ask Questions

FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

Apr 20, 2023

Polina Karpikova, Radionova Ekaterina, Anastasia Yaschenko, Andrei Spiridonov, Leonid Kostyushko, Riccardo Fabbricatore, Aleksei Ivakhnenko

Figure 1 for FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

Figure 2 for FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

Figure 3 for FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

Figure 4 for FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

Abstract:Generative DNNs are a powerful tool for image synthesis, but they are limited by their computational load. On the other hand, given a trained model and a task, e.g. faces generation within a range of characteristics, the output image quality will be unevenly distributed among images with different characteristics. It follows, that we might restrain the models complexity on some instances, maintaining a high quality. We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. We apply our method on two different SOTA models performing generative tasks: generation from a semantic map, and cross-reenactment of face expressions; showing it is able to output images with custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish their computations by up to a half. This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained, but most of the inputs need fewer computations than the complex instances.

* 12 pages, 22 figures

Via

Access Paper or Ask Questions