Fake news and misinformation are a matter of concern for people around the globe. Users of the internet and social media sites encounter content with false information much frequently. Fake news detection is one of the most analyzed and prominent areas of research. These detection techniques apply popular machine learning and deep learning algorithms. Previous work in this domain covers fake news detection vastly among text circulating online. Platforms that have extensively been observed and analyzed include news websites and Twitter. Facebook, Reddit, WhatsApp, YouTube, and other social applications are gradually gaining attention in this emerging field. Researchers are analyzing online data based on multiple modalities composed of text, image, video, speech, and other contributing factors. The combination of various modalities has resulted in efficient fake news detection. At present, there is an abundance of surveys consolidating textual fake news detection algorithms. This review primarily deals with multi-modal fake news detection techniques that include images, videos, and their combinations with text. We provide a comprehensive literature survey of eighty articles presenting state-of-the-art detection techniques, thereby identifying research gaps and building a pathway for researchers to further advance this domain.