Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junjie Hou

Generative Dataset Distillation Based on Diffusion Model

Aug 16, 2024

Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

Figure 1 for Generative Dataset Distillation Based on Diffusion Model

Figure 2 for Generative Dataset Distillation Based on Diffusion Model

Abstract:This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative model for CIFAR-100 and Tiny-ImageNet datasets, we need to use a generative model that can generate images at high speed. In this study, we proposed a novel generative dataset distillation method based on Stable Diffusion. Specifically, we use the SDXL-Turbo model which can generate images at high speed and quality. Compared to other diffusion models that can only generate images per class (IPC) = 1, our method can achieve an IPC = 10 for Tiny-ImageNet and an IPC = 20 for CIFAR-100, respectively. Additionally, to generate high-quality distilled datasets for CIFAR-100 and Tiny-ImageNet, we use the class information as text prompts and post data augmentation for the SDXL-Turbo model. Experimental results show the effectiveness of the proposed method, and we achieved third place in the generative track of the ECCV 2024 DD Challenge. Codes are available at https://github.com/Guang000/BANKO.

* The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

Via

Access Paper or Ask Questions

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Jul 21, 2024

Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, Bowen Tang

Figure 1 for D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Figure 2 for D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Figure 3 for D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Figure 4 for D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Abstract:Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performance decline on cross-architectures. We advocate for designing an economical dataset distillation framework that is independent of the matching architectures. With empirical observations, we argue that constraining the consistency of the real and synthetic image spaces will enhance the cross-architecture generalization. Motivated by this, we introduce Dataset Distillation via Disentangled Diffusion Model (D$^4$M), an efficient framework for dataset distillation. Compared to architecture-dependent methods, D$^4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. The distilled datasets are versatile, eliminating the need for repeated generation of distinct datasets for various architectures. Through comprehensive experiments, D$^4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.

* Accepted to CVPR 2024

Via

Access Paper or Ask Questions