To this date the safety assessment of materials, used for example in the nuclear power sector, commonly relies on a fracture mechanical analysis utilizing macroscopic concepts, where a global load quantity K or J is compared to the materials fracture toughness curve. Part of the experimental effort involved in these concepts is dedicated to the quantitative analysis of fracture surfaces. Within the scope of this study a methodology for the semi-supervised training of deep learning models for fracture surface segmentation on a macroscopic level was established. Therefore, three distinct and unique datasets were created to analyze the influence of structural similarity on the segmentation capability. The structural similarity differs due to the assessed materials and specimen, as well as imaging-induced variance due to fluctuations in image acquisition in different laboratories. The datasets correspond to typical isolated laboratory conditions, complex real-world circumstances, and a curated subset of the two. We implemented a weak-to-strong consistency regularization for semi-supervised learning. On the heterogeneous dataset we were able to train robust and well-generalizing models that learned feature representations from images across different domains without observing a significant drop in prediction quality. Furthermore, our approach reduced the number of labeled images required for training by a factor of 6. To demonstrate the success of our method and the benefit of our approach for the fracture mechanics assessment, we utilized the models for initial crack size measurements with the area average method. For the laboratory setting, the deep learning assisted measurements proved to have the same quality as manual measurements. For models trained on the heterogeneous dataset, very good measurement accuracies with mean deviations smaller than 1 % could be achieved...