Cross-modal fusion of different types of neuroimaging data has shown great promise for predicting the progression of Alzheimer's Disease(AD). However, most existing methods applied in neuroimaging can not efficiently fuse the functional and structural information from multi-modal neuroimages. In this work, a novel cross-modal transformer generative adversarial network(CT-GAN) is proposed to fuse functional information contained in resting-state functional magnetic resonance imaging (rs-fMRI) and structural information contained in Diffusion Tensor Imaging (DTI). The developed bi-attention mechanism can match functional information to structural information efficiently and maximize the capability of extracting complementary information from rs-fMRI and DTI. By capturing the deep complementary information between structural features and functional features, the proposed CT-GAN can detect the AD-related brain connectivity, which could be used as a bio-marker of AD. Experimental results show that the proposed model can not only improve classification performance but also detect the AD-related brain connectivity effectively.