Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Method for Curation of Web-Scraped Face Image Datasets

Apr 07, 2020

Kai Zhang, Vítor Albiero, Kevin W. Bowyer

Figure 1 for A Method for Curation of Web-Scraped Face Image Datasets

Figure 2 for A Method for Curation of Web-Scraped Face Image Datasets

Figure 3 for A Method for Curation of Web-Scraped Face Image Datasets

Figure 4 for A Method for Curation of Web-Scraped Face Image Datasets

Share this with someone who'll enjoy it:

Abstract:Web-scraped, in-the-wild datasets have become the norm in face recognition research. The numbers of subjects and images acquired in web-scraped datasets are usually very large, with number of images on the millions scale. A variety of issues occur when collecting a dataset in-the-wild, including images with the wrong identity label, duplicate images, duplicate subjects and variation in quality. With the number of images being in the millions, a manual cleaning procedure is not feasible. But fully automated methods used to date result in a less-than-ideal level of clean dataset. We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods, with similar quality across men and women, to support comparison of accuracy across gender. Our approach removes near-duplicate images, merges duplicate subjects, corrects mislabeled images, and removes images outside a defined range of pose and quality. We conduct the curation on the Asian Face Dataset (AFD) and VGGFace2 test dataset. The experiments show that a state-of-the-art method achieves a much higher accuracy on the datasets after they are curated. Finally, we release our cleaned versions of both datasets to the research community.

* This paper will appear at IWBF 2020

View paper on

Share this with someone who'll enjoy it:

Title:A Method for Curation of Web-Scraped Face Image Datasets

Paper and Code