Abstract:Siamese Networks are one of most popular visual object tracking methods for their high speed and high accuracy tracking ability as long as the target is well identified. However, most Siamese Network based trackers use the first frame as the ground truth of an object and fail when target appearance changes significantly in next frames. They also have dif iculty distinguishing the target from similar other objects in the frame. We propose two ideas to solve both problems. The first idea is using a bag of dynamic templates, containing diverse, similar, and recent target features and continuously updating it with diverse target appearances. The other idea is to let a network learn the path history and project a potential future target location in a next frame. This tracker achieves state-of-the-art performance on the long-term tracking dataset UAV20L by improving the success rate by a large margin of 15% (65.4 vs 56.6) compared to the state-of-the-art method, HiFT. The of icial python code of this paper is publicly available.