Ultrasound b-mode imaging is a qualitative approach and diagnostic quality strongly depends on operators' training and experience. Quantitative approaches can provide information about tissue properties; therefore, can be used for identifying various tissue types, e.g., speed-of-sound in the tissue can be used as a biomarker for tissue malignancy, especially in breast imaging. Recent studies showed the possibility of speed-of-sound reconstruction using deep neural networks that are fully trained on simulated data. However, because of the ever present domain shift between simulated and measured data, the stability and performance of these models in real setups are still under debate. In this study, we investigated the impacts of training data diversity on the robustness of these networks by using multiple kinds of geometrical and natural simulated phantom structures. On the simulated data, we investigated the performance of the networks on out-of-domain echogenicity, geometries, and in the presence of noise. We further inspected the stability of employing such tissue modeling in a real data acquisition setup. We demonstrated that training the network with a joint set of datasets including both geometrical and natural tissue models improves the stability of the predicted speed-of-sound values both on simulated and measured data.