Zinc electrolysis is one of the key processes in zinc smelting, and maintaining stable operation of zinc electrolysis is an important factor in ensuring production efficiency and product quality. However, poor contact between the zinc electrolysis cathode and the anode is a common problem that leads to reduced production efficiency and damage to the electrolysis cell. Therefore, online monitoring of the contact status of the plates is crucial for ensuring production quality and efficiency. To address this issue, we propose an end-to-end network, the Frequency-masked Multimodal Autoencoder (FM-AE). This method takes the cell voltage signal and infrared image information as input, and through automatic encoding, fuses the two features together and predicts the poor contact status of the plates through a cascaded detector. Experimental results show that the proposed method maintains high accuracy (86.2%) while having good robustness and generalization ability, effectively detecting poor contact status of the zinc electrolysis cell, providing strong support for production practice.