Abstract:The abundance of dark matter subhalos orbiting a host galaxy is a generic prediction of the cosmological framework. It is a promising way to constrain the nature of dark matter. Here we describe the challenges of detecting stars whose phase-space distribution may be perturbed by the passage of dark matter subhalos using a machine learning approach. The training data are three Milky Way-like galaxies and nine synthetic Gaia DR2 surveys derived from these. We first quantify the magnitude of the perturbations in the simulated galaxies using an anomaly detection algorithm. We also estimate the feasibility of this approach in the Gaia DR2-like catalogues by comparing the anomaly detection based approach with a supervised classification. We find that a classification algorithm optimized on about half a billion synthetic star observables exhibits mild but nonzero sensitivity. This classification-based approach is not sufficiently sensitive to pinpoint the exact locations of subhalos in the simulation, as would be expected from the very limited number of subhalos in the detectable region. The enormous size of the Gaia dataset motivates the further development of scalable and accurate computational methods that could be used to select potential regions of interest for dark matter searches to ultimately constrain the Milky Way's subhalo mass function.