Abstract:In this paper we propose a reinforcement learning based weakly supervised system for localisation. We train a controller function to localise regions of interest within an image by introducing a novel reward definition that utilises non-binarised classification probability, generated by a pre-trained binary classifier which classifies object presence in images or image crops. The object-presence classifier may then inform the controller of its localisation quality by quantifying the likelihood of the image containing an object. Such an approach allows us to minimize any potential labelling or human bias propagated via human labelling for fully supervised localisation. We evaluate our proposed approach for a task of cancerous lesion localisation on a large dataset of real clinical bi-parametric MR images of the prostate. Comparisons to the commonly used multiple-instance learning weakly supervised localisation and to a fully supervised baseline show that our proposed method outperforms the multi-instance learning and performs comparably to fully-supervised learning, using only image-level classification labels for training.