In this paper, a robust visual tracking approach via mixed model based convolutional neural networks (SDT) is developed. In order to handle abrupt or fast motion, a prior map is generated to facilitate the localization of region of interest (ROI) before the deep tracker is performed. A top-down saliency model with nineteen shallow cues are employed to construct the prior map with online learnt combination weights. Moreover, apart from a holistic deep learner, four local networks are also trained to learn different components of the target. The generated four local heat maps will facilitate to rectify the holistic map by eliminating the distracters to avoid drifting. Furthermore, to guarantee the instance for online update of high quality, a prioritised update strategy is implemented by casting the problem into a label noise problem. The selection probability is designed by considering both confidence values and bio-inspired memory for temporal information integration. Experiments are conducted qualitatively and quantitatively on a set of challenging image sequences. Comparative study demonstrates that the proposed algorithm outperforms other state-of-the-art methods.