While Electrical Impedance Tomography (EIT) has found many biomedicine applications, a better resolution is needed to provide quantitative analysis for tissue engineering and regenerative medicine. This paper proposes an impedance-optical dual-modal imaging framework, which is mainly aimed at high-quality 3D cell culture imaging and can be extended to other tissue engineering applications. The framework comprises three components, i.e., an impedance-optical dual-modal sensor, the guidance image processing algorithm, and a deep learning model named multi-scale feature cross fusion network (MSFCF-Net) for information fusion. The MSFCF-Net has two inputs, i.e., the EIT measurement and a binary mask image generated by the guidance image processing algorithm, whose input is an RGB microscopic image. The network then effectively fuses the information from the two different imaging modalities and generates the final conductivity image. We assess the performance of the proposed dual-modal framework by numerical simulation and MCF-7 cell imaging experiments. The results show that the proposed method could significantly improve image quality, indicating that impedance-optical joint imaging has the potential to reveal the structural and functional information of tissue-level targets simultaneously.