Active learning (AL) could contribute to solving critical environmental problems through improved spatio-temporal predictions. Yet such predictions involve high-dimensional feature spaces with mixed data types and missing data, which existing methods have difficulties dealing with. Here, we propose a novel batch AL method that fills this gap. We encode and cluster features of candidate data points, and query the best data based on the distance of embedded features to their cluster centers. We introduce a new metric of informativeness that we call embedding entropy and a general class of neural networks that we call embedding networks for using it. Empirical tests on forecasting electricity demand show a simultaneous reduction in prediction error by up to 63-88% and data usage by up to 50-69% compared to passive learning (PL) benchmarks.