In this paper we address the problem of unsupervised localization of objects in single images. Compared to previous state-of-the-art method our method is fully unsupervised in the sense that there is no prior instance level or category level information about the image. Furthermore, we treat each image individually and do not rely on any neighboring image similarity. We employ deep-learning based generation of saliency maps and region proposals to tackle this problem. First salient regions in the image are determined using an encoder/decoder architecture. The resulting saliency map is matched with region proposals from a class agnostic region proposal network to roughly localize the candidate object regions. These regions are further refined based on the overlap and similarity ratios. Our experimental evaluations on a benchmark dataset show that the method gets close to current state-of-the-art methods in terms of localization accuracy even though these make use of multiple frames. Furthermore, we created a more challenging and realistic dataset with multiple object categories and varying viewpoint and illumination conditions for evaluating the method's performance in real world scenarios.