Deep learning (DL) stereo matching methods gained great attention in remote sensing satellite datasets. However, most of these existing studies conclude assessments based only on a few/single stereo images lacking a systematic evaluation on how robust DL methods are on satellite stereo images with varying radiometric and geometric configurations. This paper provides an evaluation of four DL stereo matching methods through hundreds of multi-date multi-site satellite stereo pairs with varying geometric configurations, against the traditional well-practiced Census-SGM (Semi-global matching), to comprehensively understand their accuracy, robustness, generalization capabilities, and their practical potential. The DL methods include a learning-based cost metric through convolutional neural networks (MC-CNN) followed by SGM, and three end-to-end (E2E) learning models using Geometry and Context Network (GCNet), Pyramid Stereo Matching Network (PSMNet), and LEAStereo. Our experiments show that E2E algorithms can achieve upper limits of geometric accuracies, while may not generalize well for unseen data. The learning-based cost metric and Census-SGM are rather robust and can consistently achieve acceptable results. All DL algorithms are robust to geometric configurations of stereo pairs and are less sensitive in comparison to the Census-SGM, while learning-based cost metrics can generalize on satellite images when trained on different datasets (airborne or ground-view).