Diffusion models have found phenomenal success as expressive priors for solving inverse problems, but their extension beyond natural images to more structured scientific domains remains limited. Motivated by applications in materials science, we aim to reduce the number of measurements required from an expensive imaging modality of interest, by leveraging side information from an auxiliary modality that is much cheaper to obtain. To deal with the non-differentiable and black-box nature of the forward model, we propose a framework to train a multimodal diffusion model over the joint modalities, turning inverse problems with black-box forward models into simple linear inpainting problems. Numerically, we demonstrate the feasibility of training diffusion models over materials imagery data, and show that our approach achieves superior image reconstruction by leveraging the available side information, requiring significantly less amount of data from the expensive microscopy modality.