We propose a novel reference-based video colorization framework with spatiotemporal correspondence. Reference-based methods colorize grayscale frames referencing a user input color frame. Existing methods suffer from the color leakage between objects and the emergence of average colors, derived from non-local semantic correspondence in space. To address this issue, we warp colors only from the regions on the reference frame restricted by correspondence in time. We propagate masks as temporal correspondences, using two complementary tracking approaches: off-the-shelf instance tracking for high performance segmentation, and newly proposed dense tracking to track various types of objects. By restricting temporally-related regions for referencing colors, our approach propagates faithful colors throughout the video. Experiments demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.