Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space and thereby reduce acquisition times at the cost of decreased image quality. In this work, we directly learn the undersampling masks to derive task- and domain-specific patterns. To solve this discrete optimization challenge, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks. Furthermore, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.