In this work, we consider estimating user positions in a spatially distributed antenna system (DAS) from the uplink channel state information (CSI). However, with the increased number of remote radio heads (RRHs), collecting CSI at a central unit (CU) can significantly increase the fronthaul overhead and computational complexity of the CU. This problem can be mitigated by selecting a subset of RRHs. Thus, we present a deep learning-based approach to select a subset of RRHs for wireless localization. We employ an RRH selection layer that is jointly trained with the rest of the network and learn the model parameters as well as the set of selected RRHs. We show that the selection strategy comes at a relatively small cost of localization performance. Nonetheless, by comparison to a trivial approach based on the maximization of the channel gain, we show that the proposed method leads to significant performance gains in a propagation environment dominated by non-line-of-sight.