Abstract:With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly concentrate on extracting features of source claims and their corresponding images, while ignoring the comments of rumors and their propagation structures. These comments and structures imply the wisdom of crowds and are proved to be crucial to debunk rumors. Moreover, these methods usually only extract visual features in a basic manner, seldom consider tampering or textual information in images. Therefore, in this study, we propose a novel Vision and Graph Fused Attention Network (VGA) for rumor detection to utilize propagation structures among posts so as to obtain the crowd opinions and further explore visual tampering features, as well as the textual information hidden in images. We conduct extensive experiments on three datasets, demonstrating that VGA can effectively detect multimodal rumors and outperform state-of-the-art methods significantly.