Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical in real-world applications, especially in fields where data samples are expensive to obtain. To address this limitation, in this work, we introduce \emph{FlowDiff}, a novel joint training paradigm that leverages a conditional normalizing flow model to facilitate the training of diffusion models on corrupted data sources. The conditional normalizing flow try to learn to recover clean images through a novel amortized inference mechanism, and can thus effectively facilitate the diffusion model's training with corrupted data. On the other side, diffusion models provide strong priors which in turn improve the quality of image recovery. The flow model and the diffusion model can therefore promote each other and demonstrate strong empirical performances. Our elaborate experiment shows that FlowDiff can effectively learn clean distributions across a wide range of corrupted data sources, such as noisy and blurry images. It consistently outperforms existing baselines with significant margins under identical conditions. Additionally, we also study the learned diffusion prior, observing its superior performance in downstream computational imaging tasks, including inpainting, denoising, and deblurring.