Global localization and kidnapping are two challenging problems in robot localization. The popular method, Monte Carlo Localization (MCL) addresses the problem by sampling uniformly over the state space, which is unfortunately inefficient when the environment is large. To better deal with the the problems, we present a proposal model, named Deep Multimodal Observation Model (DMOM). DMOM takes a map and a 2D laser scan as inputs and outputs a conditional multimodal probability distribution of the pose, making the samples more focusing on the regions with higher likelihood. With such samples, the convergence is expected to be much efficient. Considering that learning based Samplable Observation Model may fail to capture the true pose sometimes, we furthermore propose the Adaptive Mixture MCL, which adaptively selects updating mode for each particle to tolerate this situation. Equipped with DMOM, Adaptive Mixture MCL can achieve more accurate estimation, faster convergence and better scalability compared with previous methods in both synthetic and real scenes. Even in real environment with long-term changing, Adaptive Mixture MCL is able to localize the robot using DMON trained only on simulated observations from a SLAM map, or even a blueprint map.