Abstract:Due to some complex factors (e.g., occlusion, pose variation and diverse camera perspectives), extracting stronger feature representation in person re-identification remains a challenging task. In this paper, we proposed a novel self-supervision and supervision combining transformer-based person re-identification framework, namely SSSC-TransReID. Different from the general transformer-based person re-identification models, we designed a self-supervised contrastive learning branch, which can enhance the feature representation for person re-identification without negative samples or additional pre-training. In order to train the contrastive learning branch, we also proposed a novel random rectangle mask strategy to simulate the occlusion in real scenes, so as to enhance the feature representation for occlusion. Finally, we utilized the joint-training loss function to integrate the advantages of supervised learning with ID tags and self-supervised contrastive learning without negative samples, which can reinforce the ability of our model to excavate stronger discriminative features, especially for occlusion. Extensive experimental results on several benchmark datasets show our proposed model obtains superior Re-ID performance consistently and outperforms the state-of-the-art ReID methods by large margins on the mean average accuracy (mAP) and Rank-1 accuracy.
Abstract:Recently, human pose estimation mainly focuses on how to design a more effective and better deep network structure as human features extractor, and most designed feature extraction networks only introduce the position of each anatomical keypoint to guide their training process. However, we found that some human anatomical keypoints kept their topology invariance, which can help to localize them more accurately when detecting the keypoints on the feature map. But to the best of our knowledge, there is no literature that has specifically studied it. Thus, in this paper, we present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints, which introduces the topology constraint term that consisting of the differences between the distance and direction of the keypoint-to-keypoint and their groundtruth in the loss object. More importantly, our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods and improve their performance. The extensive experiments on the benchmark dataset: COCO keypoint dataset, show that our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods, especially for Lite-HRNet, when our model is plugged into it, its AP scores separately raise by 2.9\% and 3.3\% on COCO val2017 and test-dev2017 datasets.
Abstract:Recently, correlation filter has been widely applied in unmanned aerial vehicle (UAV) tracking due to its high frame rates, robustness and low calculation resources. However, it is fragile because of two inherent defects, i.e, boundary effect and filter corruption. Some methods by enlarging the search area can mitigate the boundary effect, yet introducing the undesired background distractors. Another approaches can alleviate the temporal degeneration of learned filters by introducing the temporal regularizer, which depends on the assumption that the filers between consecutive frames should be coherent. In fact, sometimes the filers at the ($t-1$)th frame is vulnerable to heavy occlusion from backgrounds, which causes that the assumption does not hold. To handle them, in this work, we propose a novel $\ell_{1}$ regularization correlation filter with adaptive contextual learning and keyfilter selection for UAV tracking. Firstly, we adaptively detect the positions of effective contextual distractors by the aid of the distribution of local maximum values on the response map of current frame which is generated by using the previous correlation filter model. Next, we eliminate inconsistent labels for the tracked target by removing one on each distractor and develop a new score scheme for each distractor. Then, we can select the keyfilter from the filters pool by finding the maximal similarity between the target at the current frame and the target template corresponding to each filter in the filters pool. Finally, quantitative and qualitative experiments on three authoritative UAV datasets show that the proposed method is superior to the state-of-the-art tracking methods based on correlation filter framework.
Abstract:Recently, part-based and support vector machines (SVM) based trackers have shown favorable performance. Nonetheless, the time-consuming online training and updating process limit their real-time applications. In order to better deal with the partial occlusion issue and improve their efficiency, we propose a novel part-based structural support correlation filter tracking method, which absorbs the strong discriminative ability from SVM and the excellent property of part-based tracking methods which is less sensitive to partial occlusion. Then, our proposed model can learn the support correlation filter of each part jointly by a star structure model, which preserves the spatial layout structure among parts and tolerates outliers of parts. In addition, to mitigate the issue of drift away from object further, we introduce inter-frame consistencies of local parts into our model. Finally, in our model, we accurately estimate the scale changes of object by the relative distance change among reliable parts. The extensive empirical evaluations on three benchmark datasets: OTB2015, TempleColor128 and VOT2015 demonstrate that the proposed method performs superiorly against several state-of-the-art trackers in terms of tracking accuracy, speed and robustness.