Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Dec 06, 2023

Peng Sun, Bei Shi, Daiwei Yu, Tao Lin

Figure 1 for On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Figure 2 for On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Figure 3 for On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Figure 4 for On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Share this with someone who'll enjoy it:

Abstract:Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).

* 17 pages, 20 figures

View paper on

Share this with someone who'll enjoy it:

Title:On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Paper and Code