https://github.com/phiphiphi31/DMB.
Existing template-based trackers usually localize the target in each frame with bounding box, thereby being limited in learning pixel-wise representation and handling complex and non-rigid transformation of the target. Further, existing segmentation tracking methods are still insufficient in modeling and exploiting dense correspondence of target pixels across frames. To overcome these limitations, this work presents a novel discriminative segmentation tracking architecture equipped with dual memory banks, i.e., appearance memory bank and spatial memory bank. In particular, the appearance memory bank utilizes spatial and temporal non-local similarity to propagate segmentation mask to the current frame, and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on the VOT2016, VOT2018, VOT2019, GOT-10K and TrackingNet benchmarks, especially achieving the EAO of 0.535 and 0.506 respectively on VOT2016 and VOT2018. Moreover, our approach outperforms the leading segmentation tracker D3S on two video object segmentation benchmarks DAVIS16 and DAVIS17. The source code will be released at