Fault diagnosis of rotating machinery plays a important role for the safety and stability of modern industrial systems. However, there is a distribution discrepancy between training data and data of real-world operation scenarios, which causing the decrease of performance of existing systems. This paper proposed a transfer learning based method utilizing acoustic and vibration signal to address this distribution discrepancy. We designed the acoustic and vibration feature fusion MAVgram to offer richer and more reliable information of faults, coordinating with a DNN-based classifier to obtain more effective diagnosis representation. The backbone was pre-trained and then fine-tuned to obtained excellent performance of the target task. Experimental results demonstrate the effectiveness of the proposed method, and achieved improved performance compared to STgram-MFN.