Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Will Large-scale Generative Models Corrupt Future Datasets?

Nov 15, 2022

Ryuichiro Hataya, Han Bao, Hiromi Arai

Figure 1 for Will Large-scale Generative Models Corrupt Future Datasets?

Figure 2 for Will Large-scale Generative Models Corrupt Future Datasets?

Figure 3 for Will Large-scale Generative Models Corrupt Future Datasets?

Figure 4 for Will Large-scale Generative Models Corrupt Future Datasets?

Share this with someone who'll enjoy it:

Abstract:Recently proposed large-scale text-to-image generative models such as DALL$\cdot$E 2, Midjourney, and StableDiffusion can generate high-quality and realistic images from users' prompts. Not limited to the research community, ordinary Internet users enjoy these generative models, and consequently a tremendous amount of generated images have been shared on the Internet. Meanwhile, today's success of deep learning in the computer vision field owes a lot to images collected from the Internet. These trends lead us to a research question: "will such generated images impact the quality of future datasets and the performance of computer vision models positively or negatively?" This paper empirically answers this question by simulating contamination. Namely, we generate ImageNet-scale and COCO-scale datasets using a state-of-the-art generative model and evaluate models trained on ``contaminated'' datasets on various tasks including image classification and image generation. Throughout experiments, we conclude that generated images negatively affect downstream performance, while the significance depends on tasks and the amount of generated images. The generated datasets are available via https://github.com/moskomule/dataset-contamination.

View paper on

Share this with someone who'll enjoy it:

Title:Will Large-scale Generative Models Corrupt Future Datasets?

Paper and Code