Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gonzalo Martin Garcia

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Sep 17, 2024

Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe

Figure 1 for Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Figure 2 for Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Figure 3 for Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Figure 4 for Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Abstract:Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. The fixed model performs comparably to the best previously reported configuration while being more than 200$\times$ faster. To optimize for downstream task performance, we perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models on common zero-shot benchmarks. We surprisingly find that this fine-tuning protocol also works directly on Stable Diffusion and achieves comparable performance to current state-of-the-art diffusion-based depth and normal estimation models, calling into question some of the conclusions drawn from prior works.

* Project page: https://vision.rwth-aachen.de/diffusion-e2e-ft

Via

Access Paper or Ask Questions