Abstract:Most existing instance segmentation methods only focus on 2D objects and are not suitable for 3D scenes such as autonomous driving. In this paper, we propose a model that splits instance segmentation and object detection into two parallel branches. We discretize the objects depth into "depth categories" (background set to 0, objects set to [1, K]), then the instance segmentation task has been transformed into a pixel-level classification task. Mask branch predicts pixel-level "depth categories", 3D branch predicts instance-level "depth categories", we produce instance mask by assigning pixels which have same "depth categories" to each instance. In addition, in order to solve the problem of imbalanced between mask labels and 3D labels in the KITTI dataset (200 for mask, 7481 for 3D), we use unreal mask generated by other instance segmentation method to train mask branch. Despite the use of unreal mask labels, experiments result on KITTI dataset still achieves state-of-the-art performance in vehicle instance segmentation.