Colorectal cancer (CRC) micro-satellite instability (MSI) prediction on histopathology images is a challenging weakly supervised learning task that involves multi-instance learning on gigapixel images. To date, radiology images have proven to have CRC MSI information and efficient patient imaging techniques. Different data modalities integration offers the opportunity to increase the accuracy and robustness of MSI prediction. Despite the progress in representation learning from the whole slide images (WSI) and exploring the potential of making use of radiology data, CRC MSI prediction remains a challenge to fuse the information from multiple data modalities (e.g., pathology WSI and radiology CT image). In this paper, we propose $M^{2}$Fusion: a Bayesian-based multimodal multi-level fusion pipeline for CRC MSI. The proposed fusion model $M^{2}$Fusion is capable of discovering more novel patterns within and across modalities that are beneficial for predicting MSI than using a single modality alone, as well as other fusion methods. The contribution of the paper is three-fold: (1) $M^{2}$Fusion is the first pipeline of multi-level fusion on pathology WSI and 3D radiology CT image for MSI prediction; (2) CT images are the first time integrated into multimodal fusion for CRC MSI prediction; (3) feature-level fusion strategy is evaluated on both Transformer-based and CNN-based method. Our approach is validated on cross-validation of 352 cases and outperforms either feature-level (0.8177 vs. 0.7908) or decision-level fusion strategy (0.8177 vs. 0.7289) on AUC score.