In the field of materials science, exploring the relationship between composition, microstructure, and properties has long been a critical research focus. The mechanical performance of solid-solution Mg-Gd alloys is significantly influenced by Gd content, dendritic structures, and the presence of secondary phases. To better analyze and predict the impact of these factors, this study proposes a multimodal fusion learning framework based on image processing and deep learning techniques. This framework integrates both elemental composition and microstructural features to accurately predict the Vickers hardness of solid-solution Mg-Gd alloys. Initially, deep learning methods were employed to extract microstructural information from a variety of solid-solution Mg-Gd alloy images obtained from literature and experiments. This provided precise grain size and secondary phase microstructural features for performance prediction tasks. Subsequently, these quantitative analysis results were combined with Gd content information to construct a performance prediction dataset. Finally, a regression model based on the Transformer architecture was used to predict the Vickers hardness of Mg-Gd alloys. The experimental results indicate that the Transformer model performs best in terms of prediction accuracy, achieving an R^2 value of 0.9. Additionally, SHAP analysis identified critical values for four key features affecting the Vickers hardness of Mg-Gd alloys, providing valuable guidance for alloy design. These findings not only enhance the understanding of alloy performance but also offer theoretical support for future material design and optimization.