Abstract:The success of deep neural networks in the traditional keypoint detection task encourages researchers to solve new problems and collect more complex datasets. The size of the DeepFashion2 dataset poses a new challenge on the keypoint detection task, as it comprises 13 clothing categories that span a wide range of keypoints (294 in total). The direct prediction of all keypoints leads to huge memory consumption, slow training, and a slow inference time. This paper studies the keypoint grouping approach and how it affects the performance of the CenterNet architecture. We propose a simple and efficient automatic grouping technique with a powerful post-processing method and apply it to the DeepFashion2 fashion landmark task and the MS COCO pose estimation task. This reduces memory consumption and processing time during inference by up to 19% and 30% respectively, and during the training stage by 28% and 26% respectively, without compromising accuracy.
Abstract:The single-stage approach for fast clothing detection as a modification of a multi-target network, CenterNet, is proposed in this paper. We introduce several powerful post-processing techniques that may be applied to increase the quality of keypoint localization tasks. The semantic keypoint grouping approach and post-processing techniques make it possible to achieve a state-of-the-art accuracy of 0.737 mAP for the bounding box detection task and 0.591 mAP for the landmark detection task on the DeepFashion2 validation dataset. We have also achieved the second place in the DeepFashion2 Challenge 2020 with 0.582 mAP on the test dataset. The proposed approach can also be used on low-power devices with relatively high accuracy without requiring any post-processing techniques.