Abstract:The cost of data annotation is a substantial impediment for multi-label image classification: in every image, every category must be labeled as present or absent. Single positive multi-label (SPML) learning is a cost-effective solution, where models are trained on a single positive label per image. Thus, SPML is a more challenging domain, since it requires dealing with missing labels. In this work, we propose a method to turn single positive data into fully-labeled data: Pseudo Multi-Labels. Basically, a teacher network is trained on single positive labels. Then, we treat the teacher model's predictions on the training data as ground-truth labels to train a student network on fully-labeled images. With this simple approach, we show that the performance achieved by the student model approaches that of a model trained on the actual fully-labeled images.
Abstract:Annotating data for multi-label classification is prohibitively expensive because every category of interest must be confirmed to be present or absent. Recent work on single positive multi-label (SPML) learning shows that it is possible to train effective multi-label classifiers using only one positive label per image. However, the standard benchmarks for SPML are derived from traditional multi-label classification datasets by retaining one positive label for each training example (chosen uniformly at random) and discarding all other labels. In realistic settings it is not likely that positive labels are chosen uniformly at random. This work introduces protocols for studying label bias in SPML and provides new empirical results.