In this report, we introduce our (pretty straightforard) two-step "detect-then-match" video instance segmentation method. The first step performs instance segmentation for each frame to get a large number of instance mask proposals. The second step is to do inter-frame instance mask matching with the help of optical flow. We demonstrate that with high quality mask proposals, a simple matching mechanism is good enough for tracking. Our approach achieves the first place in the UVO 2021 Video-based Open-World Segmentation Challenge.