Abstract:Fine-Grained Visual Classification (FGVC) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. This paper describes our contribution at SnakeCLEF2022 with FGVC. Firstly, we design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification. Secondly, we provide new loss functions to solve the long tail distribution with dataset. Then, in order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training to provide pre-trained model. Moreover, some effective data process tricks also are considered in our experiments. Last but not least, fine-tuned in downstream task with hard mining, ensambled kinds of model performance. Extensive experiments demonstrate that our method can effectively improve the performance of fine-grained recognition. Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.