Abstract:Semantic segmentation of point clouds is essential for understanding the built environment, and a large amount of high-quality data is required for training deep learning models. Despite synthetic point clouds (SPC) having the potential to compensate for the shortage of real data, how to exploit the benefits of SPC is still open. Therefore, this study systematically investigates how color and mixing proportion of SPC impact semantic segmentation for the first time. First, a new method to mimic the scanning process and generate SPC based on BIM is proposed, to create a synthetic dataset with consistent colors of BIM (UniSPC) and a synthetic dataset with real colors (RealSPC) respectively. Subsequently, by integrating with the S3DIS dataset, further experiments on PointNet, PointNet++, and DGCNN are conducted. Meanwhile, benchmark experiments and new evaluation metrics are introduced to better evaluate the performance of different models. Experiments show that synthetic color significantly impacts model performance, the performance for common components of the models trained with pure RealSPC is comparable to models with real data, and RealSPC contributes average improvements of 14.1% on overall accuracy and 7.3% on mIoU than UniSPC. Furthermore, the proportion of SPC also has a significant impact on the performance. In mixing training experiments, adding more than 70% SPC achieves an average of 3.9% on overall accuracy and 3.4% on mIoU better than benchmark on three models. It is also revealed that for large flat elements such as floors, ceilings, and walls, the SPC can even replace real point clouds without compromising model performance.