Key challenges in developing underwater acoustic localization methods are related to the combined effects of high reverberation in intricate environments. To address such challenges, recent studies have shown that with a properly designed architecture, neural networks can lead to unprecedented localization capabilities and enhanced accuracy. However, the robustness of such methods to environmental mismatch is typically hard to characterize, and is usually assessed only empirically. In this work, we consider the recently proposed data-driven method [19] based on a deep convolutional neural network, and demonstrate that it can learn to localize in complex and mismatched environments. To explain this robustness, we provide an upper bound on the localization mean squared error (MSE) in the ``true" environment, in terms of the MSE in a ``presumed" environment and an additional penalty term related to the environmental discrepancy. Our theoretical results are corroborated via simulation results in a rich, highly reverberant, and mismatch channel.