We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the KL-divergence. Based on this idea, there are two challenges that need to be addressed. First, the computational complexity of KL-divergence is unacceptable when the dimensions of distributions increases. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The second challenge is the differentiability of the optimization process, since the target distribution is intractable. We handle this problem by using Kernel Density Estimation (KDE). The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable. Thus, it can be a better substitute of the variational inference in variational auto-encoders (VAE). One strong evidence of the benefit of our method is that the distributions learned by the proposed approach can generate better images even based on a pre-trained VAE's decoder. Based on this point, we devise a distribution learning auto-encoder which is better than VAE under the same network architecture. Experiments on standard dataset and synthetic data demonstrate the efficiency of the proposed approach.