In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the spatial relationships of instances in a 3D space. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. The dataset provides joint annotation of two kinds of orderings for the same instances, and we discover that the occlusion order and depth order are complementary. We also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches. Moreover, we propose InstaDepthNet that uses auxiliary geometric order loss to boost the instance-wise depth prediction accuracy of MiDaS. These contributions to geometric scene understanding will help to improve the accuracy of various computer vision tasks.