Abstract:Gait recognition is an emerging identification technology that distinguishes individuals at long distances by analyzing individual walking patterns. Traditional techniques rely heavily on large-scale labeled datasets, which incurs high costs and significant labeling challenges. Recently, researchers have explored unsupervised gait recognition with clustering-based unsupervised domain adaptation methods and achieved notable success. However, these methods directly use pseudo-label generated by clustering and neglect pseudolabel noise caused by domain differences, which affects the effect of the model training process. To mitigate these issues, we proposed a novel model called GaitDCCR, which aims to reduce the influence of noisy pseudo labels on clustering and model training. Our approach can be divided into two main stages: clustering and training stage. In the clustering stage, we propose Dynamic Cluster Parameters (DCP) and Dynamic Weight Centroids (DWC) to improve the efficiency of clustering and obtain reliable cluster centroids. In the training stage, we employ the classical teacher-student structure and propose Confidence-based Pseudo-label Refinement (CPR) and Contrastive Teacher Module (CTM) to encourage noisy samples to converge towards clusters containing their true identities. Extensive experiments on public gait datasets have demonstrated that our simple and effective method significantly enhances the performance of unsupervised gait recognition, laying the foundation for its application in the real-world.The code is available at https://github.com/YanSun-github/GaitDCCR
Abstract:Gait recognition is a promising biometric technology for identification due to its non-invasiveness and long-distance. However, external variations such as clothing changes and viewpoint differences pose significant challenges to gait recognition. Silhouette-based methods preserve body shape but neglect internal structure information, while skeleton-based methods preserve structure information but omit appearance. To fully exploit the complementary nature of the two modalities, a novel triple branch gait recognition framework, TriGait, is proposed in this paper. It effectively integrates features from the skeleton and silhouette data in a hybrid fusion manner, including a two-stream network to extract static and motion features from appearance, a simple yet effective module named JSA-TC to capture dependencies between all joints, and a third branch for cross-modal learning by aligning and fusing low-level features of two modalities. Experimental results demonstrate the superiority and effectiveness of TriGait for gait recognition. The proposed method achieves a mean rank-1 accuracy of 96.0% over all conditions on CASIA-B dataset and 94.3% accuracy for CL, significantly outperforming all the state-of-the-art methods. The source code will be available at https://github.com/feng-xueling/TriGait/.
Abstract:Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA.