Abstract:Image classification has always been a hot and challenging task. This paper is a brief report to our submission to the VIPriors Image Classification Challenge. In this challenge, the difficulty is how to train the model from scratch without any pretrained weight. In our method, several strong backbones and multiple loss functions are used to learn more representative features. To improve the models' generalization and robustness, efficient image augmentation strategies are utilized, like autoaugment and cutmix. Finally, ensemble learning is used to increase the performance of the models. The final Top-1 accuracy of our team DeepBlueAI is 0.7015, ranking second in the leaderboard.
Abstract:This paper is a brief report to our submission to the VIPriors Action Recognition Challenge. Action recognition has attracted many researchers attention for its full application, but it is still challenging. In this paper, we study previous methods and propose our method. In our method, we are primarily making improvements on the SlowFast Network and fusing with TSM to make further breakthroughs. Also, we use a fast but effective way to extract motion features from videos by using residual frames as input. Better motion features can be extracted using residual frames with SlowFast, and the residual-frame-input path is an excellent supplement for existing RGB-frame-input models. And better performance obtained by combining 3D convolution(SlowFast) with 2D convolution(TSM). The above experiments were all trained from scratch on UCF101.
Abstract:This paper is a brief report to our submission to the Recognizing Families In the Wild Data Challenge (4th Edition), in conjunction with FG 2020 Forum. Automatic kinship recognition has attracted many researchers' attention for its full application, but it is still a very challenging task because of the limited information that can be used to determine whether a pair of faces are blood relatives or not. In this paper, we studied previous methods and proposed our method. We try many methods, like deep metric learning-based, to extract deep embedding feature for every image, then determine if they are blood relatives by Euclidean distance or method based on classes. Finally, we find some tricks like sampling more negative samples and high resolution that can help get better performance. Moreover, we proposed a symmetric network with a binary classification based method to get our best score in all tasks.
Abstract:Person re-identification has attracted many researchers' attention for its wide application, but it is still a very challenging task because only part of the image information can be used for personnel matching. Most of current methods uses CNN to learn to embeddings that can capture semantic similarity information among data points. Many of the state-of-the-arts methods use complex network structures with multiple branches that fuse multiple features while training or testing, using classification loss, Triplet loss or a combination of the two as loss function. However, the method that using Triplet loss as loss function converges slowly, and the method in which pull features of the same class as close as possible in features space leads to poor feature stability. This paper will combine the ranking motivated structured loss, proposed a new metric learning loss function that make the features of the same class are sparsely distributed into the range of small hyperspheres and the features of different classes are uniformly distributed at a clearly angle. And adopted a new single-branch network structure that only using global feature can also get great performance. The validity of our method is verified on the Market1501 and DukeMTMC-ReID person re-identification datasets. Finally acquires 90.9% rank-1 accuracy and 80.8% mAP on DukeMTMC-reID, 95.3% rank-1 accuracy and 88.7% mAP on Market1501. Codes and models are available in Github.https://github.com/Qidian213/Ranked_Person_ReID.