Abstract:Most current gait recognition methods suffer from poor interpretability and high computational cost. To improve interpretability, we investigate gait features in the embedding space based on Koopman operator theory. The transition matrix in this space captures complex kinematic features of gait cycles, namely the Koopman operator. The diagonal elements of the operator matrix can represent the overall motion trend, providing a physically meaningful descriptor. To reduce the computational cost of our algorithm, we use a reversible autoencoder to reduce the model size and eliminate convolutional layers to compress its depth, resulting in fewer floating-point operations. Experimental results on multiple datasets show that our method reduces computational cost to 1% compared to state-of-the-art methods while achieving competitive recognition accuracy 98% on non-occlusion datasets.
Abstract:Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground-truth. However, some sub-categories may have few available samples in real-world applications. In this paper, we propose a novel few-shot fine-grained image classification network (FicNet) using multi-frequency Neighborhood (MFN) and double-cross modulation (DCM). Module MFN is adopted to capture the information in spatial domain and frequency domain. Then, the self-similarity and multi-frequency components are extracted to produce multi-frequency structural representation. DCM employs bi-crisscross component and double 3D cross-attention components to modulate the embedding process by considering global context information and subtle relationship between categories, respectively. The comprehensive experiments on three fine-grained benchmark datasets for two few-shot tasks verify that FicNet has excellent performance compared to the state-of-the-art methods. Especially, the experiments on two datasets, "Caltech-UCSD Birds" and "Stanford Cars", can obtain classification accuracy 93.17\% and 95.36\%, respectively. They are even higher than that the general fine-grained image classification methods can achieve.