Abstract:The construction industry has been traditionally slow in adopting digital technologies. However, these are becoming increasingly necessary due to a plentitude of challenges, such as a shortage of skilled labor and decreasing productivity levels compared to other industries. Autonomous robotic systems can alleviate this problem, but the software development process for these systems is heavily driven by data, a resource usually challenging to find in the construction domain due to the lack of public availability. In our work, we therefore provide a dataset of 14,805 RGB images with segmentation labels for reinforced concrete construction and make it publicly available. We conduct a detailed analysis of our dataset and discuss how to deal with labeling inconsistencies. Furthermore, we establish baselines for the YOLOv8L-seg, DeepLabV3, and U-Net segmentation models and investigate the influence of data availability and label inconsistencies on the performance of these models. Our study showed that the models are precise in their predictions but would benefit from more data to increase the number of recalled instances. Label inconsistencies had a negligible effect on model performance, and we, therefore, advocate for a crowd-sourced dataset to boost the development of autonomous robotic systems in the construction industry.