Abstract:Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). This model utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for to both character images and tree nodes, enabling efficient and effective training. This approach accelerates training significantly (by a factor of 2 or more) while enhancing accuracy. Extensive experiments show that processing characters through formation trees aligns better with their inherent properties than direct sequential methods, significantly enhancing the generality and usability of the representations.
Abstract:Open-pit mine change detection (CD) in high-resolution (HR) remote sensing images plays a crucial role in mineral development and environmental protection. Significant progress has been made in this field in recent years, largely due to the advancement of deep learning techniques. However, existing deep-learning-based CD methods encounter challenges in effectively integrating neighborhood and scale information, resulting in suboptimal performance. Therefore, by exploring the influence patterns of neighborhood and scale information, this paper proposes an Integrated Neighborhood and Scale Information Network (INSINet) for open-pit mine CD in HR remote sensing images. Specifically, INSINet introduces 8-neighborhood-image information to acquire a larger receptive field, improving the recognition of center image boundary regions. Drawing on techniques of skip connection, deep supervision, and attention mechanism, the multi-path deep supervised attention (MDSA) module is designed to enhance multi-scale information fusion and change feature extraction. Experimental analysis reveals that incorporating neighborhood and scale information enhances the F1 score of INSINet by 6.40%, with improvements of 3.08% and 3.32% respectively. INSINet outperforms existing methods with an Overall Accuracy of 97.69%, Intersection over Union of 71.26%, and F1 score of 83.22%. INSINet shows significance for open-pit mine CD in HR remote sensing images.