Abstract:The stability and reliability of wireless data transmission in vehicular networks face significant challenges due to the high dynamics of path loss caused by the complexity of rapidly changing environments. This paper proposes a multi-modal environmental sensing-based path loss prediction architecture (MES-PLA) for V2I communications. First, we establish a multi-modal environment data and channel joint acquisition platform to generate a spatio-temporally synchronized and aligned dataset of environmental and channel data. Then we designed a multi-modal feature extraction and fusion network (MFEF-Net) for multi-modal environmental sensing data. MFEF-Net extracts features from RGB images, point cloud data, and GPS information, and integrates them with an attention mechanism to effectively leverage the strengths of each modality. The simulation results demonstrate that the Root Mean Square Error (RMSE) of MES-PLA is 2.20 dB, indicating a notable improvement in prediction accuracy compared to single-modal sensing data input. Moreover, MES-PLA exhibits enhanced stability under varying illumination conditions compared to single-modal methods.