Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmet Berke Gokmen

RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

May 19, 2025

Ahmet Berke Gokmen, Yigit Ekin, Bahri Batuhan Bilecen, Aysegul Dundar

Abstract:We propose RoPECraft, a training-free video motion transfer method for diffusion transformers that operates solely by modifying their rotary positional embeddings (RoPE). We first extract dense optical flow from a reference video, and utilize the resulting motion offsets to warp the complex-exponential tensors of RoPE, effectively encoding motion into the generation process. These embeddings are then further optimized during denoising time steps via trajectory alignment between the predicted and target velocities using a flow-matching objective. To keep the output faithful to the text prompt and prevent duplicate generations, we incorporate a regularization term based on the phase components of the reference video's Fourier transform, projecting the phase angles onto a smooth manifold to suppress high-frequency artifacts. Experiments on benchmarks reveal that RoPECraft outperforms all recently published methods, both qualitatively and quantitatively.

* https://berkegokmen1.github.io/RoPECraft/

Via

Access Paper or Ask Questions

Identity Preserving 3D Head Stylization with Multiview Score Distillation

Nov 20, 2024

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar

Figure 1 for Identity Preserving 3D Head Stylization with Multiview Score Distillation

Figure 2 for Identity Preserving 3D Head Stylization with Multiview Score Distillation

Figure 3 for Identity Preserving 3D Head Stylization with Multiview Score Distillation

Figure 4 for Identity Preserving 3D Head Stylization with Multiview Score Distillation

Abstract:3D head stylization transforms realistic facial features into artistic representations, enhancing user engagement across gaming and virtual reality applications. While 3D-aware generators have made significant advancements, many 3D stylization methods primarily provide near-frontal views and struggle to preserve the unique identities of original subjects, often resulting in outputs that lack diversity and individuality. This paper addresses these challenges by leveraging the PanoHead model, synthesizing images from a comprehensive 360-degree perspective. We propose a novel framework that employs negative log-likelihood distillation (LD) to enhance identity preservation and improve stylization quality. By integrating multi-view grid score and mirror gradients within the 3D GAN architecture and introducing a score rank weighing technique, our approach achieves substantial qualitative and quantitative improvements. Our findings not only advance the state of 3D head stylization but also provide valuable insights into effective distillation processes between diffusion models and GANs, focusing on the critical issue of identity preservation. Please visit the https://three-bee.github.io/head_stylization for more visuals.

* https://three-bee.github.io/head_stylization

Via

Access Paper or Ask Questions

Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Sep 30, 2024

Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Aysegul Dundar

Figure 1 for Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Figure 2 for Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Figure 3 for Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Figure 4 for Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Abstract:3D GAN inversion aims to project a single image into the latent space of a 3D Generative Adversarial Network (GAN), thereby achieving 3D geometry reconstruction. While there exist encoders that achieve good results in 3D GAN inversion, they are predominantly built on EG3D, which specializes in synthesizing near-frontal views and is limiting in synthesizing comprehensive 3D scenes from diverse viewpoints. In contrast to existing approaches, we propose a novel framework built on PanoHead, which excels in synthesizing images from a 360-degree perspective. To achieve realistic 3D modeling of the input image, we introduce a dual encoder system tailored for high-fidelity reconstruction and realistic generation from different viewpoints. Accompanying this, we propose a stitching framework on the triplane domain to get the best predictions from both. To achieve seamless stitching, both encoders must output consistent results despite being specialized for different tasks. For this reason, we carefully train these encoders using specialized losses, including an adversarial loss based on our novel occlusion-aware triplane discriminator. Experiments reveal that our approach surpasses the existing encoder training methods qualitatively and quantitatively. Please visit the project page: https://berkegokmen1.github.io/dual-enc-3d-gan-inv.

* Joint first two authors. Accepted to NeurIPS 2024

Via

Access Paper or Ask Questions